2023-04-18 15:49:43

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 0/6] remove the vmas parameter from GUP APIs

(pin_/get)_user_pages[_remote]() each provide an optional output parameter
for an array of VMA objects associated with each page in the input range.

These provide the means for VMAs to be returned, as long as mm->mmap_lock
is never released during the GUP operation (i.e. the internal flag
FOLL_UNLOCKABLE is not specified).

In addition, these VMAs have also to only be accessed under the mmap_lock,
and become invalidated the moment it is released.

The vast majority of invocations do not use this functionality and of those
that do, all but one retrieve a single VMA to perform checks upon.

It is not egregious in the single VMA cases to simply replace the operation
with a vma_lookup(). In these cases we duplicate the (fast) lookup on a
slow path already under the mmap_lock, abstracted to a new
get_user_page_vma_remote() inline helper function which also performs error
checking and reference count maintenance.

The special case is io_uring, where io_pin_pages() specifically needs to
assert that all the VMAs possess the same vm->vm_file (possibly NULL) and
they are either anonymous or hugetlb pages.

We adjust this to perform its own VMA lookup, which in most cases should
consist of a single VMA, so the performance cost on an already slow path
should be minimal. By doing so, we avoid an allocation in any case.

In future it is sensible to simply restrict write pinning of file-backed
folios in which case we may be able to simply avoid this check altogether,
but for the time being we maintain it as-is.

Eliminating the vmas parameter eliminates an entire class of possible
danging pointer errors should the lock have been incorrectly released.

In addition the API is simplified and now clearly expresses what it is for
- applying the specified GUP flags and (if pinning) returning pinned pages.

This change additionally opens the door to further potential improvements
in GUP and the possible marrying of disparate code paths.

I have run the gup_test and a simple io_uring program which exercises the
use of FOLL_SAME_PAGE with no issues.

This patch series is rebased on mm-unstable as of 17th April.

Thanks to Matthew Wilcox for suggesting this refactoring!

v4:
- Drop FOLL_SAME_FILE as the complexity costs exceed the benefit of having it
for a single case.
- Update io_pin_pages() to perform VMA lookup directly.
- Add get_user_page_vma_remote() to perform the single page/VMA lookup with
error checks performed correctly.

v3:
- Always explicitly handle !vma cases, feeding back an error to the user if
appropriate, indicating the operation did not completely succeed if not
and always with a warning since these conditions should be impossible.
https://lore.kernel.org/linux-mm/[email protected]/

v2:
- Only lookup the VMA if the pin succeeded (other than __access_remote_vm()
which has different semantics)
- Be pedantically careful about ensuring that under no circumstances can we
fail to unpin a page
https://lore.kernel.org/linux-mm/[email protected]/

v1:
https://lore.kernel.org/linux-mm/[email protected]/

Lorenzo Stoakes (6):
mm/gup: remove unused vmas parameter from get_user_pages()
mm/gup: remove unused vmas parameter from pin_user_pages_remote()
mm/gup: remove vmas parameter from get_user_pages_remote()
io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()
mm/gup: remove vmas parameter from pin_user_pages()
mm/gup: remove vmas array from internal GUP functions

arch/arm64/kernel/mte.c | 17 ++--
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
arch/s390/kvm/interrupt.c | 2 +-
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +-
drivers/infiniband/sw/siw/siw_mem.c | 2 +-
drivers/iommu/iommufd/pages.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
fs/exec.c | 2 +-
include/linux/hugetlb.h | 10 +-
include/linux/mm.h | 42 +++++++--
io_uring/rsrc.c | 53 ++++++-----
kernel/events/uprobes.c | 13 +--
mm/gup.c | 105 +++++++--------------
mm/gup_test.c | 14 ++-
mm/hugetlb.c | 24 ++---
mm/memory.c | 14 +--
mm/process_vm_access.c | 2 +-
mm/rmap.c | 2 +-
net/xdp/xdp_umem.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 3 +-
virt/kvm/kvm_main.c | 2 +-
29 files changed, 161 insertions(+), 174 deletions(-)

--
2.40.0


2023-04-18 15:49:45

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

No invocation of get_user_pages() uses the vmas parameter, so remove
it.

The GUP API is confusing and caveated. Recent changes have done much to
improve that, however there is more we can do. Exporting vmas is a prime
target as the caller has to be extremely careful to preclude their use
after the mmap_lock has expired or otherwise be left with dangling
pointers.

Removing the vmas parameter focuses the GUP functions upon their primary
purpose - pinning (and outputting) pages as well as performing the actions
implied by the input flags.

This is part of a patch series aiming to remove the vmas parameter
altogether.

Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
include/linux/mm.h | 3 +--
mm/gup.c | 9 +++------
mm/gup_test.c | 5 ++---
virt/kvm/kvm_main.c | 2 +-
7 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 21ca0a831b70..5d390df21440 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
if (!(vma->vm_flags & VM_MAYEXEC))
return -EACCES;

- ret = get_user_pages(src, 1, 0, &src_page, NULL);
+ ret = get_user_pages(src, 1, 0, &src_page);
if (ret < 1)
return -EFAULT;

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 1e8e287e113c..0597540f0dde 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -362,7 +362,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm
struct page **pages = ttm->pages + pinned;

r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
- pages, NULL);
+ pages);
if (r < 0)
goto release_pages;

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index b836936e9747..378cf02a2aa1 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
#else
*pageshift = PAGE_SHIFT;
#endif
- if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
+ if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
put_page(page);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 37554b08bb28..b14cc4972d0b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2380,8 +2380,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
diff --git a/mm/gup.c b/mm/gup.c
index 1f72a717232b..7e454d6b157e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2251,8 +2251,6 @@ long get_user_pages_remote(struct mm_struct *mm,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* This is the same as get_user_pages_remote(), just with a less-flexible
* calling convention where we assume that the mm being operated on belongs to
@@ -2260,16 +2258,15 @@ long get_user_pages_remote(struct mm_struct *mm,
* obviously don't pass FOLL_REMOTE in here.
*/
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;

- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;

return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- vmas, &locked, gup_flags);
+ NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);

diff --git a/mm/gup_test.c b/mm/gup_test.c
index 8ae7307a1bb6..9ba8ea23f84e 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case GUP_BASIC_TEST:
- nr = get_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = get_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_FAST_BENCHMARK:
nr = pin_user_pages_fast(addr, nr, gup->gup_flags,
@@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i, NULL);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
break;
default:
ret = -EINVAL;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d255964ec331..7f31e0a4adb5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2474,7 +2474,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
{
int rc, flags = FOLL_HWPOISON | FOLL_WRITE;

- rc = get_user_pages(addr, 1, flags, NULL, NULL);
+ rc = get_user_pages(addr, 1, flags, NULL);
return rc == -EHWPOISON;
}

--
2.40.0

2023-04-18 15:50:05

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 2/6] mm/gup: remove unused vmas parameter from pin_user_pages_remote()

No invocation of pin_user_pages_remote() uses the vmas parameter, so remove
it. This forms part of a larger patch set eliminating the use of the vmas
parameters altogether.

Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
drivers/iommu/iommufd/pages.c | 4 ++--
drivers/vfio/vfio_iommu_type1.c | 2 +-
include/linux/mm.h | 2 +-
mm/gup.c | 8 +++-----
mm/process_vm_access.c | 2 +-
5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index f8d92c9bb65b..9d55a2188a64 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -786,7 +786,7 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
user->locked = 1;
}
rc = pin_user_pages_remote(pages->source_mm, uptr, npages,
- user->gup_flags, user->upages, NULL,
+ user->gup_flags, user->upages,
&user->locked);
}
if (rc <= 0) {
@@ -1787,7 +1787,7 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
rc = pin_user_pages_remote(
pages->source_mm, (uintptr_t)(pages->uptr + index * PAGE_SIZE),
1, (flags & IOMMUFD_ACCESS_RW_WRITE) ? FOLL_WRITE : 0, &page,
- NULL, NULL);
+ NULL);
mmap_read_unlock(pages->source_mm);
if (rc != 1) {
if (WARN_ON(rc >= 0))
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 493c31de0edb..e6dc8fec3ed5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -562,7 +562,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,

mmap_read_lock(mm);
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
- pages, NULL, NULL);
+ pages, NULL);
if (ret > 0) {
int i;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b14cc4972d0b..ec9875c59f6d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2378,7 +2378,7 @@ long get_user_pages_remote(struct mm_struct *mm,
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 7e454d6b157e..931c805bc32b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3093,8 +3093,6 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
* @gup_flags: flags modifying lookup behaviour
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: pointer to lock flag indicating whether lock is held and
* subsequently whether VM_FAULT_RETRY functionality can be
* utilised. Lock must initially be held.
@@ -3109,14 +3107,14 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, vmas, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
return 0;
- return __gup_longterm_locked(mm, start, nr_pages, pages, vmas,
+ return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
locked ? locked : &local_locked,
gup_flags);
}
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 78dfaf9e8990..0523edab03a6 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -104,7 +104,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
mmap_read_lock(mm);
pinned_pages = pin_user_pages_remote(mm, pa, pinned_pages,
flags, process_pages,
- NULL, &locked);
+ &locked);
if (locked)
mmap_read_unlock(mm);
if (pinned_pages <= 0)
--
2.40.0

2023-04-18 15:50:12

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 3/6] mm/gup: remove vmas parameter from get_user_pages_remote()

The only instances of get_user_pages_remote() invocations which used the
vmas parameter were for a single page which can instead simply look up the
VMA directly. In particular:-

- __update_ref_ctr() looked up the VMA but did nothing with it so we simply
remove it.

- __access_remote_vm() was already using vma_lookup() when the original
lookup failed so by doing the lookup directly this also de-duplicates the
code.

We are able to perform these VMA operations as we already hold the
mmap_lock in order to be able to call get_user_pages_remote().

As part of this work we add get_user_page_vma_remote() which abstracts the
VMA lookup, error handling and decrementing the page reference count should
the VMA lookup fail.

This forms part of a broader set of patches intended to eliminate the vmas
parameter altogether.

Reviewed-by: Catalin Marinas <[email protected]> (for arm64)
Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/arm64/kernel/mte.c | 17 +++++++++--------
arch/s390/kvm/interrupt.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 34 +++++++++++++++++++++++++++++++---
kernel/events/uprobes.c | 13 +++++--------
mm/gup.c | 12 ++++--------
mm/memory.c | 14 +++++++-------
mm/rmap.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 3 +--
10 files changed, 61 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..cc793c246653 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -419,10 +419,9 @@ long get_mte_ctrl(struct task_struct *task)
static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
struct iovec *kiov, unsigned int gup_flags)
{
- struct vm_area_struct *vma;
void __user *buf = kiov->iov_base;
size_t len = kiov->iov_len;
- int ret;
+ int err = 0;
int write = gup_flags & FOLL_WRITE;

if (!access_ok(buf, len))
@@ -432,14 +431,16 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
return -EIO;

while (len) {
+ struct vm_area_struct *vma;
unsigned long tags, offset;
void *maddr;
- struct page *page = NULL;
+ struct page *page = get_user_page_vma_remote(mm, addr,
+ gup_flags, &vma);

- ret = get_user_pages_remote(mm, addr, 1, gup_flags, &page,
- &vma, NULL);
- if (ret <= 0)
+ if (IS_ERR_OR_NULL(page)) {
+ err = page == NULL ? -EIO : PTR_ERR(page);
break;
+ }

/*
* Only copy tags if the page has been mapped as PROT_MTE
@@ -449,7 +450,7 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
* was never mapped with PROT_MTE.
*/
if (!(vma->vm_flags & VM_MTE)) {
- ret = -EOPNOTSUPP;
+ err = -EOPNOTSUPP;
put_page(page);
break;
}
@@ -482,7 +483,7 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
kiov->iov_len = buf - kiov->iov_base;
if (!kiov->iov_len) {
/* check for error accessing the tracee's address space */
- if (ret <= 0)
+ if (err)
return -EIO;
else
return -EFAULT;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 9250fde1f97d..c19d0cb7d2f2 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2777,7 +2777,7 @@ static struct page *get_map_page(struct kvm *kvm, u64 uaddr)

mmap_read_lock(kvm->mm);
get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
- &page, NULL, NULL);
+ &page, NULL);
mmap_read_unlock(kvm->mm);
return page;
}
diff --git a/fs/exec.c b/fs/exec.c
index 87cf3a2f0e9a..d8d48ee15aac 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -219,7 +219,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
- &page, NULL, NULL);
+ &page, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ec9875c59f6d..0c236e2f25e2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2364,6 +2364,9 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,
unmap_mapping_range(mapping, holebegin, holelen, 0);
}

+static inline struct vm_area_struct *vma_lookup(struct mm_struct *mm,
+ unsigned long addr);
+
extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
void *buf, int len, unsigned int gup_flags);
extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
@@ -2372,13 +2375,38 @@ extern int __access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, unsigned int gup_flags);

long get_user_pages_remote(struct mm_struct *mm,
- unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ int *locked);
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
int *locked);
+
+static inline struct page *get_user_page_vma_remote(struct mm_struct *mm,
+ unsigned long addr,
+ int gup_flags,
+ struct vm_area_struct **vmap)
+{
+ struct page *page;
+ struct vm_area_struct *vma;
+ int got = get_user_pages_remote(mm, addr, 1, gup_flags, &page, NULL);
+
+ if (got < 0)
+ return ERR_PTR(got);
+ if (got == 0)
+ return NULL;
+
+ vma = vma_lookup(mm, addr);
+ if (WARN_ON_ONCE(!vma)) {
+ put_page(page);
+ return ERR_PTR(-EINVAL);
+ }
+
+ *vmap = vma;
+ return page;
+}
+
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 59887c69d54c..cac3aef7c6f7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -365,7 +365,6 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
{
void *kaddr;
struct page *page;
- struct vm_area_struct *vma;
int ret;
short *ptr;

@@ -373,7 +372,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
return -EINVAL;

ret = get_user_pages_remote(mm, vaddr, 1,
- FOLL_WRITE, &page, &vma, NULL);
+ FOLL_WRITE, &page, NULL);
if (unlikely(ret <= 0)) {
/*
* We are asking for 1 page. If get_user_pages_remote() fails,
@@ -474,10 +473,9 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (is_register)
gup_flags |= FOLL_SPLIT_PMD;
/* Read the page with vaddr into memory */
- ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
- &old_page, &vma, NULL);
- if (ret <= 0)
- return ret;
+ old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma);
+ if (IS_ERR_OR_NULL(old_page))
+ return PTR_ERR(old_page);

ret = verify_opcode(old_page, vaddr, &opcode);
if (ret <= 0)
@@ -2027,8 +2025,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
* but we treat this as a 'remote' access since it is
* essentially a kernel access to the memory.
*/
- result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page,
- NULL, NULL);
+ result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
if (result < 0)
return result;

diff --git a/mm/gup.c b/mm/gup.c
index 931c805bc32b..9440aa54c741 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2165,8 +2165,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: pointer to lock flag indicating whether lock is held and
* subsequently whether VM_FAULT_RETRY functionality can be
* utilised. Lock must initially be held.
@@ -2181,8 +2179,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
*
* The caller is responsible for releasing returned @pages, via put_page().
*
- * @vmas are valid only as long as mmap_lock is held.
- *
* Must be called with mmap_lock held for read or write.
*
* get_user_pages_remote walks a process's page tables and takes a reference
@@ -2219,15 +2215,15 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, vmas, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
FOLL_TOUCH | FOLL_REMOTE))
return -EINVAL;

- return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
locked ? locked : &local_locked,
gup_flags);
}
@@ -2237,7 +2233,7 @@ EXPORT_SYMBOL(get_user_pages_remote);
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 8ddb10199e8d..61b7192acf98 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5577,7 +5577,6 @@ EXPORT_SYMBOL_GPL(generic_access_phys);
int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
int len, unsigned int gup_flags)
{
- struct vm_area_struct *vma;
void *old_buf = buf;
int write = gup_flags & FOLL_WRITE;

@@ -5586,13 +5585,15 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,

/* ignore errors, just check how much was successfully transferred */
while (len) {
- int bytes, ret, offset;
+ int bytes, offset;
void *maddr;
- struct page *page = NULL;
+ struct vm_area_struct *vma;
+ struct page *page = get_user_page_vma_remote(mm, addr,
+ gup_flags, &vma);
+
+ if (IS_ERR_OR_NULL(page)) {
+ int ret = 0;

- ret = get_user_pages_remote(mm, addr, 1,
- gup_flags, &page, &vma, NULL);
- if (ret <= 0) {
#ifndef CONFIG_HAVE_IOREMAP_PROT
break;
#else
@@ -5600,7 +5601,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
* Check if this is a VM_IO | VM_PFNMAP VMA, which
* we can access using slightly different code.
*/
- vma = vma_lookup(mm, addr);
if (!vma)
break;
if (vma->vm_ops && vma->vm_ops->access)
diff --git a/mm/rmap.c b/mm/rmap.c
index ba901c416785..756ea8a9bb90 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2324,7 +2324,7 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,

npages = get_user_pages_remote(mm, start, npages,
FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
- pages, NULL, NULL);
+ pages, NULL);
if (npages < 0)
return npages;

diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 31af29f669d2..ac20c0bdff9d 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -916,7 +916,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1,
- FOLL_FORCE, &page, NULL, NULL);
+ FOLL_FORCE, &page, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return false;
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 9bfe1d6f6529..e033c79d528e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -61,8 +61,7 @@ static void async_pf_execute(struct work_struct *work)
* access remotely.
*/
mmap_read_lock(mm);
- get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, NULL,
- &locked);
+ get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked);
if (locked)
mmap_read_unlock(mm);

--
2.40.0

2023-04-18 15:50:15

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

We are shortly to remove pin_user_pages(), and instead perform the required
VMA checks ourselves. In most cases there will be a single VMA so this
should caues no undue impact on an already slow path.

Doing this eliminates the one instance of vmas being used by
pin_user_pages().

Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++---------------------
1 file changed, 31 insertions(+), 24 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 7a43aed8e395..3a927df9d913 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
return ret;
}

+static int check_vmas_locked(unsigned long addr, unsigned long len)
+{
+ struct file *file;
+ VMA_ITERATOR(vmi, current->mm, addr);
+ struct vm_area_struct *vma = vma_next(&vmi);
+ unsigned long end = addr + len;
+
+ if (WARN_ON_ONCE(!vma))
+ return -EINVAL;
+
+ file = vma->vm_file;
+ if (file && !is_file_hugepages(file))
+ return -EOPNOTSUPP;
+
+ /* don't support file backed memory */
+ for_each_vma_range(vmi, vma, end) {
+ if (vma->vm_file != file)
+ return -EINVAL;
+
+ if (file && !vma_is_shmem(vma))
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
- struct vm_area_struct **vmas = NULL;
struct page **pages = NULL;
- int i, pret, ret = -ENOMEM;
+ int pret, ret = -ENOMEM;

end = (ubuf + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
start = ubuf >> PAGE_SHIFT;
@@ -1153,31 +1178,14 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
if (!pages)
goto done;

- vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *),
- GFP_KERNEL);
- if (!vmas)
- goto done;
-
ret = 0;
mmap_read_lock(current->mm);
+
pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
- pages, vmas);
- if (pret == nr_pages) {
- struct file *file = vmas[0]->vm_file;
+ pages, NULL);

- /* don't support file backed memory */
- for (i = 0; i < nr_pages; i++) {
- if (vmas[i]->vm_file != file) {
- ret = -EINVAL;
- break;
- }
- if (!file)
- continue;
- if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) {
- ret = -EOPNOTSUPP;
- break;
- }
- }
+ if (pret == nr_pages) {
+ ret = check_vmas_locked(ubuf, len);
*npages = nr_pages;
} else {
ret = pret < 0 ? pret : -EFAULT;
@@ -1194,7 +1202,6 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
}
ret = 0;
done:
- kvfree(vmas);
if (ret < 0) {
kvfree(pages);
pages = ERR_PTR(ret);
--
2.40.0

2023-04-18 15:52:16

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 5/6] mm/gup: remove vmas parameter from pin_user_pages()

We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.

This clears the way to removing the vmas parameter from GUP altogether.

Acked-by: David Hildenbrand <[email protected]>
Acked-by: Dennis Dalessandro <[email protected]> (for qib)
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +-
drivers/infiniband/sw/siw/siw_mem.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
include/linux/mm.h | 3 +--
io_uring/rsrc.c | 4 +---
mm/gup.c | 9 +++------
mm/gup_test.c | 9 ++++-----
net/xdp/xdp_umem.c | 2 +-
12 files changed, 17 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 81d7185e2ae8..d19fb1f3007d 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -105,7 +105,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,

ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n,
FOLL_WRITE | FOLL_LONGTERM,
- mem->hpages + entry, NULL);
+ mem->hpages + entry);
if (ret == n) {
pinned += n;
continue;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index f693bc753b6b..1bb7507325bc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -111,7 +111,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
ret = pin_user_pages(start_page + got * PAGE_SIZE,
num_pages - got,
FOLL_LONGTERM | FOLL_WRITE,
- p + got, NULL);
+ p + got);
if (ret < 0) {
mmap_read_unlock(current->mm);
goto bail_release;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 2a5cac2658ec..84e0f41e7dfa 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -140,7 +140,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
ret = pin_user_pages(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof(struct page *)),
- gup_flags, page_list, NULL);
+ gup_flags, page_list);

if (ret < 0)
goto out;
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index f51ab2ccf151..e6e25f15567d 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -422,7 +422,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
umem->page_chunk[i].plist = plist;
while (nents) {
rv = pin_user_pages(first_page_va, nents, foll_flags,
- plist, NULL);
+ plist);
if (rv < 0)
goto out_sem_up;

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 53001532e8e3..405b89ea1054 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -180,7 +180,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
data, size, dma->nr_pages);

err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, gup_flags,
- dma->pages, NULL);
+ dma->pages);

if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0c3b48616a9f..1f80254604f0 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -995,7 +995,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
goto out;

pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
- page_list, NULL);
+ page_list);
if (pinned != npages) {
ret = pinned < 0 ? pinned : -ENOMEM;
goto out;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..4317128c1c62 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -952,7 +952,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
while (npages) {
sz2pin = min_t(unsigned long, npages, list_size);
pinned = pin_user_pages(cur_base, sz2pin,
- gup_flags, page_list, NULL);
+ gup_flags, page_list);
if (sz2pin != pinned) {
if (pinned < 0) {
ret = pinned;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0c236e2f25e2..d02c42d37bfc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2410,8 +2410,7 @@ static inline struct page *get_user_page_vma_remote(struct mm_struct *mm,
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 3a927df9d913..82d90b97c17b 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1180,10 +1180,8 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)

ret = 0;
mmap_read_lock(current->mm);
-
pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
- pages, NULL);
-
+ pages);
if (pret == nr_pages) {
ret = check_vmas_locked(ubuf, len);
*npages = nr_pages;
diff --git a/mm/gup.c b/mm/gup.c
index 9440aa54c741..dffbfa623aae 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3124,8 +3124,6 @@ EXPORT_SYMBOL(pin_user_pages_remote);
* @gup_flags: flags modifying lookup behaviour
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and
* FOLL_PIN is set.
@@ -3134,15 +3132,14 @@ EXPORT_SYMBOL(pin_user_pages_remote);
* see Documentation/core-api/pin_user_pages.rst for details.
*/
long pin_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;

- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
return 0;
return __gup_longterm_locked(current->mm, start, nr_pages,
- pages, vmas, &locked, gup_flags);
+ pages, NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);

diff --git a/mm/gup_test.c b/mm/gup_test.c
index 9ba8ea23f84e..1668ce0e0783 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -146,18 +146,17 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case PIN_BASIC_TEST:
- nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_LONGTERM_BENCHMARK:
nr = pin_user_pages(addr, nr,
gup->gup_flags | FOLL_LONGTERM,
- pages + i, NULL);
+ pages + i);
break;
case DUMP_USER_PAGES_TEST:
if (gup->test_flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
nr = pin_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
pages + i);
@@ -270,7 +269,7 @@ static inline int pin_longterm_test_start(unsigned long arg)
gup_flags, pages);
else
cur_pages = pin_user_pages(addr, remaining_pages,
- gup_flags, pages, NULL);
+ gup_flags, pages);
if (cur_pages < 0) {
pin_longterm_test_stop();
ret = cur_pages;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 02207e852d79..06cead2b8e34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -103,7 +103,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)

mmap_read_lock(current->mm);
npgs = pin_user_pages(address, umem->npgs,
- gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
+ gup_flags | FOLL_LONGTERM, &umem->pgs[0]);
mmap_read_unlock(current->mm);

if (npgs != umem->npgs) {
--
2.40.0

2023-04-18 15:52:51

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v4 6/6] mm/gup: remove vmas array from internal GUP functions

Now we have eliminated all callers to GUP APIs which use the vmas
parameter, eliminate it altogether.

This eliminates a class of bugs where vmas might have been kept around
longer than the mmap_lock and thus we need not be concerned about locks
being dropped during this operation leaving behind dangling pointers.

This simplifies the GUP API and makes it considerably clearer as to its
purpose - follow flags are applied and if pinning, an array of pages is
returned.

Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
include/linux/hugetlb.h | 10 ++---
mm/gup.c | 83 +++++++++++++++--------------------------
mm/hugetlb.c | 24 +++++-------
3 files changed, 45 insertions(+), 72 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 28703fe22386..2735e7a2b998 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -141,9 +141,8 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
- struct page **, struct vm_area_struct **,
- unsigned long *, unsigned long *, long, unsigned int,
- int *);
+ struct page **, unsigned long *, unsigned long *,
+ long, unsigned int, int *);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *,
zap_flags_t);
@@ -297,9 +296,8 @@ static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,

static inline long follow_hugetlb_page(struct mm_struct *mm,
struct vm_area_struct *vma, struct page **pages,
- struct vm_area_struct **vmas, unsigned long *position,
- unsigned long *nr_pages, long i, unsigned int flags,
- int *nonblocking)
+ unsigned long *position, unsigned long *nr_pages,
+ long i, unsigned int flags, int *nonblocking)
{
BUG();
return 0;
diff --git a/mm/gup.c b/mm/gup.c
index dffbfa623aae..7d548d5a21a6 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1024,8 +1024,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: whether we're still with the mmap_lock held
*
* Returns either number of pages pinned (which may be less than the
@@ -1039,8 +1037,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
*
* The caller is responsible for releasing returned @pages, via put_page().
*
- * @vmas are valid only as long as mmap_lock is held.
- *
* Must be called with mmap_lock held. It may be released. See below.
*
* __get_user_pages walks a process's page tables and takes a reference to
@@ -1076,7 +1072,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
@@ -1116,9 +1112,9 @@ static long __get_user_pages(struct mm_struct *mm,
goto out;

if (is_vm_hugetlb_page(vma)) {
- i = follow_hugetlb_page(mm, vma, pages, vmas,
- &start, &nr_pages, i,
- gup_flags, locked);
+ i = follow_hugetlb_page(mm, vma, pages,
+ &start, &nr_pages, i,
+ gup_flags, locked);
if (!*locked) {
/*
* We've got a VM_FAULT_RETRY
@@ -1183,10 +1179,6 @@ static long __get_user_pages(struct mm_struct *mm,
ctx.page_mask = 0;
}
next_page:
- if (vmas) {
- vmas[i] = vma;
- ctx.page_mask = 0;
- }
page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
if (page_increm > nr_pages)
page_increm = nr_pages;
@@ -1341,7 +1333,6 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
- struct vm_area_struct **vmas,
int *locked,
unsigned int flags)
{
@@ -1379,7 +1370,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
pages_done = 0;
for (;;) {
ret = __get_user_pages(mm, start, nr_pages, flags, pages,
- vmas, locked);
+ locked);
if (!(flags & FOLL_UNLOCKABLE)) {
/* VM_FAULT_RETRY couldn't trigger, bypass */
pages_done = ret;
@@ -1443,7 +1434,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,

*locked = 1;
ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
- pages, NULL, locked);
+ pages, locked);
if (!*locked) {
/* Continue to retry until we succeeded */
BUG_ON(ret != 0);
@@ -1541,7 +1532,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* not result in a stack expansion that recurses back here.
*/
ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked ? locked : &local_locked);
+ NULL, locked ? locked : &local_locked);
lru_add_drain();
return ret;
}
@@ -1599,7 +1590,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
return -EINVAL;

ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked);
+ NULL, locked);
lru_add_drain();
return ret;
}
@@ -1667,8 +1658,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
#else /* CONFIG_MMU */
static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
unsigned long nr_pages, struct page **pages,
- struct vm_area_struct **vmas, int *locked,
- unsigned int foll_flags)
+ int *locked, unsigned int foll_flags)
{
struct vm_area_struct *vma;
bool must_unlock = false;
@@ -1712,8 +1702,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
if (pages[i])
get_page(pages[i]);
}
- if (vmas)
- vmas[i] = vma;
+
start = (start + PAGE_SIZE) & PAGE_MASK;
}

@@ -1894,8 +1883,7 @@ struct page *get_dump_page(unsigned long addr)
int locked = 0;
int ret;

- ret = __get_user_pages_locked(current->mm, addr, 1, &page, NULL,
- &locked,
+ ret = __get_user_pages_locked(current->mm, addr, 1, &page, &locked,
FOLL_FORCE | FOLL_DUMP | FOLL_GET);
return (ret == 1) ? page : NULL;
}
@@ -2068,7 +2056,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
- struct vm_area_struct **vmas,
int *locked,
unsigned int gup_flags)
{
@@ -2076,13 +2063,13 @@ static long __gup_longterm_locked(struct mm_struct *mm,
long rc, nr_pinned_pages;

if (!(gup_flags & FOLL_LONGTERM))
- return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages,
locked, gup_flags);

flags = memalloc_pin_save();
do {
nr_pinned_pages = __get_user_pages_locked(mm, start, nr_pages,
- pages, vmas, locked,
+ pages, locked,
gup_flags);
if (nr_pinned_pages <= 0) {
rc = nr_pinned_pages;
@@ -2100,9 +2087,8 @@ static long __gup_longterm_locked(struct mm_struct *mm,
* Check that the given flags are valid for the exported gup/pup interface, and
* update them with the required flags that the caller must have set.
*/
-static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
- int *locked, unsigned int *gup_flags_p,
- unsigned int to_set)
+static bool is_valid_gup_args(struct page **pages, int *locked,
+ unsigned int *gup_flags_p, unsigned int to_set)
{
unsigned int gup_flags = *gup_flags_p;

@@ -2144,13 +2130,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
(gup_flags & FOLL_PCI_P2PDMA)))
return false;

- /*
- * Can't use VMAs with locked, as locked allows GUP to unlock
- * which invalidates the vmas array
- */
- if (WARN_ON_ONCE(vmas && (gup_flags & FOLL_UNLOCKABLE)))
- return false;
-
*gup_flags_p = gup_flags;
return true;
}
@@ -2219,11 +2198,11 @@ long get_user_pages_remote(struct mm_struct *mm,
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, locked, &gup_flags,
FOLL_TOUCH | FOLL_REMOTE))
return -EINVAL;

- return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
+ return __get_user_pages_locked(mm, start, nr_pages, pages,
locked ? locked : &local_locked,
gup_flags);
}
@@ -2258,11 +2237,11 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
{
int locked = 1;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;

return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- NULL, &locked, gup_flags);
+ &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);

@@ -2286,12 +2265,12 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
{
int locked = 0;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_TOUCH | FOLL_UNLOCKABLE))
return -EINVAL;

return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- NULL, &locked, gup_flags);
+ &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages_unlocked);

@@ -2974,7 +2953,7 @@ static int internal_get_user_pages_fast(unsigned long start,
start += nr_pinned << PAGE_SHIFT;
pages += nr_pinned;
ret = __gup_longterm_locked(current->mm, start, nr_pages - nr_pinned,
- pages, NULL, &locked,
+ pages, &locked,
gup_flags | FOLL_TOUCH | FOLL_UNLOCKABLE);
if (ret < 0) {
/*
@@ -3016,7 +2995,7 @@ int get_user_pages_fast_only(unsigned long start, int nr_pages,
* FOLL_FAST_ONLY is required in order to match the API description of
* this routine: no fall back to regular ("slow") GUP.
*/
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_GET | FOLL_FAST_ONLY))
return -EINVAL;

@@ -3049,7 +3028,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
* FOLL_GET, because gup fast is always a "pin with a +1 page refcount"
* request.
*/
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_GET))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_GET))
return -EINVAL;
return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
}
@@ -3074,7 +3053,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
int pin_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages)
{
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
return -EINVAL;
return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
}
@@ -3107,10 +3086,10 @@ long pin_user_pages_remote(struct mm_struct *mm,
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, locked, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
return 0;
- return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
+ return __gup_longterm_locked(mm, start, nr_pages, pages,
locked ? locked : &local_locked,
gup_flags);
}
@@ -3136,10 +3115,10 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
{
int locked = 1;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
return 0;
return __gup_longterm_locked(current->mm, start, nr_pages,
- pages, NULL, &locked, gup_flags);
+ pages, &locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);

@@ -3153,11 +3132,11 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
{
int locked = 0;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_UNLOCKABLE))
return 0;

- return __gup_longterm_locked(current->mm, start, nr_pages, pages, NULL,
+ return __gup_longterm_locked(current->mm, start, nr_pages, pages,
&locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages_unlocked);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 7747160f6de8..ae9a08c1fdde 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6380,17 +6380,14 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
}
#endif /* CONFIG_USERFAULTFD */

-static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
- int refs, struct page **pages,
- struct vm_area_struct **vmas)
+static void record_subpages(struct page *page, struct vm_area_struct *vma,
+ int refs, struct page **pages)
{
int nr;

for (nr = 0; nr < refs; nr++) {
if (likely(pages))
pages[nr] = nth_page(page, nr);
- if (vmas)
- vmas[nr] = vma;
}
}

@@ -6463,9 +6460,9 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
}

long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
- struct page **pages, struct vm_area_struct **vmas,
- unsigned long *position, unsigned long *nr_pages,
- long i, unsigned int flags, int *locked)
+ struct page **pages, unsigned long *position,
+ unsigned long *nr_pages, long i, unsigned int flags,
+ int *locked)
{
unsigned long pfn_offset;
unsigned long vaddr = *position;
@@ -6593,7 +6590,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
* If subpage information not requested, update counters
* and skip the same_page loop below.
*/
- if (!pages && !vmas && !pfn_offset &&
+ if (!pages && !pfn_offset &&
(vaddr + huge_page_size(h) < vma->vm_end) &&
(remainder >= pages_per_huge_page(h))) {
vaddr += huge_page_size(h);
@@ -6608,11 +6605,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
(vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);

- if (pages || vmas)
- record_subpages_vmas(nth_page(page, pfn_offset),
- vma, refs,
- likely(pages) ? pages + i : NULL,
- vmas ? vmas + i : NULL);
+ if (pages)
+ record_subpages(nth_page(page, pfn_offset),
+ vma, refs,
+ likely(pages) ? pages + i : NULL);

if (pages) {
/*
--
2.40.0

2023-04-18 16:04:59

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 18.04.23 17:49, Lorenzo Stoakes wrote:
> We are shortly to remove pin_user_pages(), and instead perform the required
> VMA checks ourselves. In most cases there will be a single VMA so this
> should caues no undue impact on an already slow path.
>
> Doing this eliminates the one instance of vmas being used by
> pin_user_pages().
>
> Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++---------------------
> 1 file changed, 31 insertions(+), 24 deletions(-)
>
> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 7a43aed8e395..3a927df9d913 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
> return ret;
> }
>
> +static int check_vmas_locked(unsigned long addr, unsigned long len)

TBH, the whole "_locked" suffix is a bit confusing.

I was wondering why you'd want to check whether the VMAs are locked ...

> +{
> + struct file *file;
> + VMA_ITERATOR(vmi, current->mm, addr);
> + struct vm_area_struct *vma = vma_next(&vmi);
> + unsigned long end = addr + len;
> +
> + if (WARN_ON_ONCE(!vma))
> + return -EINVAL;
> +
> + file = vma->vm_file;
> + if (file && !is_file_hugepages(file))
> + return -EOPNOTSUPP;

You'd now be rejecting vma_is_shmem() here, no?


--
Thanks,

David / dhildenb

2023-04-18 16:06:48

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 18.04.23 17:55, David Hildenbrand wrote:
> On 18.04.23 17:49, Lorenzo Stoakes wrote:
>> We are shortly to remove pin_user_pages(), and instead perform the required
>> VMA checks ourselves. In most cases there will be a single VMA so this
>> should caues no undue impact on an already slow path.
>>
>> Doing this eliminates the one instance of vmas being used by
>> pin_user_pages().
>>
>> Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
>> Signed-off-by: Lorenzo Stoakes <[email protected]>
>> ---
>> io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++---------------------
>> 1 file changed, 31 insertions(+), 24 deletions(-)
>>
>> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
>> index 7a43aed8e395..3a927df9d913 100644
>> --- a/io_uring/rsrc.c
>> +++ b/io_uring/rsrc.c
>> @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
>> return ret;
>> }
>>
>> +static int check_vmas_locked(unsigned long addr, unsigned long len)
>
> TBH, the whole "_locked" suffix is a bit confusing.
>
> I was wondering why you'd want to check whether the VMAs are locked ...

FWIW, "check_vmas_compatible_locked" might be better. But the "_locked"
is still annoying.

--
Thanks,

David / dhildenb

2023-04-18 16:26:15

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Tue, Apr 18, 2023 at 05:55:48PM +0200, David Hildenbrand wrote:
> On 18.04.23 17:49, Lorenzo Stoakes wrote:
> > We are shortly to remove pin_user_pages(), and instead perform the required
> > VMA checks ourselves. In most cases there will be a single VMA so this
> > should caues no undue impact on an already slow path.
> >
> > Doing this eliminates the one instance of vmas being used by
> > pin_user_pages().
> >
> > Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
> > ---
> > io_uring/rsrc.c | 55 ++++++++++++++++++++++++++++---------------------
> > 1 file changed, 31 insertions(+), 24 deletions(-)
> >
> > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> > index 7a43aed8e395..3a927df9d913 100644
> > --- a/io_uring/rsrc.c
> > +++ b/io_uring/rsrc.c
> > @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
> > return ret;
> > }
> > +static int check_vmas_locked(unsigned long addr, unsigned long len)
>
> TBH, the whole "_locked" suffix is a bit confusing.
>
> I was wondering why you'd want to check whether the VMAs are locked ...
>

Yeah it's annoying partly because GUP itself is super inconsistent about
it. Idea is to try to indicate that you need to hold mmap_lock
obviously... let's drop _locked then since we're inconsistent with it anyway.

> > +{
> > + struct file *file;
> > + VMA_ITERATOR(vmi, current->mm, addr);
> > + struct vm_area_struct *vma = vma_next(&vmi);
> > + unsigned long end = addr + len;
> > +
> > + if (WARN_ON_ONCE(!vma))
> > + return -EINVAL;
> > +
> > + file = vma->vm_file;
> > + if (file && !is_file_hugepages(file))
> > + return -EOPNOTSUPP;
>
> You'd now be rejecting vma_is_shmem() here, no?
>

Good spot, I guess I was confused that io_uring would actually want to do
that... not sure who'd want to actually mapping some shmem for this
purpose!

I'll update to make it match the existing code.

>
> --
> Thanks,
>
> David / dhildenb
>

To avoid spam, here's a -fix patch for both:-

----8<----
From 62838d66ee01e631c7c8aa3848b6892d1478c5b6 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <[email protected]>
Date: Tue, 18 Apr 2023 17:11:01 +0100
Subject: [PATCH] io_uring: rsrc: avoid use of vmas parameter in
pin_user_pages()

Rename function to avoid confusion and correct shmem check as suggested by
David.
---
io_uring/rsrc.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 3a927df9d913..483b975e31b3 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1138,7 +1138,7 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
return ret;
}

-static int check_vmas_locked(unsigned long addr, unsigned long len)
+static int check_vmas_compatible(unsigned long addr, unsigned long len)
{
struct file *file;
VMA_ITERATOR(vmi, current->mm, addr);
@@ -1149,15 +1149,16 @@ static int check_vmas_locked(unsigned long addr, unsigned long len)
return -EINVAL;

file = vma->vm_file;
- if (file && !is_file_hugepages(file))
- return -EOPNOTSUPP;

/* don't support file backed memory */
for_each_vma_range(vmi, vma, end) {
if (vma->vm_file != file)
return -EINVAL;

- if (file && !vma_is_shmem(vma))
+ if (!file)
+ continue;
+
+ if (!vma_is_shmem(vma) && !is_file_hugepages(file))
return -EOPNOTSUPP;
}

@@ -1185,7 +1186,7 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
pages, NULL);

if (pret == nr_pages) {
- ret = check_vmas_locked(ubuf, len);
+ ret = check_vmas_compatible(ubuf, len);
*npages = nr_pages;
} else {
ret = pret < 0 ? pret : -EFAULT;
--
2.40.0

2023-04-19 08:47:47

by Christian König

[permalink] [raw]
Subject: Re: [PATCH v4 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

Am 18.04.23 um 17:49 schrieb Lorenzo Stoakes:
> No invocation of get_user_pages() uses the vmas parameter, so remove
> it.
>
> The GUP API is confusing and caveated. Recent changes have done much to
> improve that, however there is more we can do. Exporting vmas is a prime
> target as the caller has to be extremely careful to preclude their use
> after the mmap_lock has expired or otherwise be left with dangling
> pointers.
>
> Removing the vmas parameter focuses the GUP functions upon their primary
> purpose - pinning (and outputting) pages as well as performing the actions
> implied by the input flags.
>
> This is part of a patch series aiming to remove the vmas parameter
> altogether.

Astonishing that there are so few users of the original get_user_pages()
API left.

>
> Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
> Acked-by: Greg Kroah-Hartman <[email protected]>
> Acked-by: David Hildenbrand <[email protected]>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Lorenzo Stoakes <[email protected]>

Acked-by: Christian König <[email protected]> for the radeon parts.

Regards,
Christian.

> ---
> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
> drivers/misc/sgi-gru/grufault.c | 2 +-
> include/linux/mm.h | 3 +--
> mm/gup.c | 9 +++------
> mm/gup_test.c | 5 ++---
> virt/kvm/kvm_main.c | 2 +-
> 7 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 21ca0a831b70..5d390df21440 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
> if (!(vma->vm_flags & VM_MAYEXEC))
> return -EACCES;
>
> - ret = get_user_pages(src, 1, 0, &src_page, NULL);
> + ret = get_user_pages(src, 1, 0, &src_page);
> if (ret < 1)
> return -EFAULT;
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 1e8e287e113c..0597540f0dde 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -362,7 +362,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm
> struct page **pages = ttm->pages + pinned;
>
> r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
> - pages, NULL);
> + pages);
> if (r < 0)
> goto release_pages;
>
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index b836936e9747..378cf02a2aa1 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> #else
> *pageshift = PAGE_SHIFT;
> #endif
> - if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> + if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0)
> return -EFAULT;
> *paddr = page_to_phys(page);
> put_page(page);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 37554b08bb28..b14cc4972d0b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2380,8 +2380,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
> unsigned int gup_flags, struct page **pages,
> struct vm_area_struct **vmas, int *locked);
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas);
> + unsigned int gup_flags, struct page **pages);
> long pin_user_pages(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> struct vm_area_struct **vmas);
> diff --git a/mm/gup.c b/mm/gup.c
> index 1f72a717232b..7e454d6b157e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2251,8 +2251,6 @@ long get_user_pages_remote(struct mm_struct *mm,
> * @pages: array that receives pointers to the pages pinned.
> * Should be at least nr_pages long. Or NULL, if caller
> * only intends to ensure the pages are faulted in.
> - * @vmas: array of pointers to vmas corresponding to each page.
> - * Or NULL if the caller does not require them.
> *
> * This is the same as get_user_pages_remote(), just with a less-flexible
> * calling convention where we assume that the mm being operated on belongs to
> @@ -2260,16 +2258,15 @@ long get_user_pages_remote(struct mm_struct *mm,
> * obviously don't pass FOLL_REMOTE in here.
> */
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas)
> + unsigned int gup_flags, struct page **pages)
> {
> int locked = 1;
>
> - if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH))
> + if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
> return -EINVAL;
>
> return __get_user_pages_locked(current->mm, start, nr_pages, pages,
> - vmas, &locked, gup_flags);
> + NULL, &locked, gup_flags);
> }
> EXPORT_SYMBOL(get_user_pages);
>
> diff --git a/mm/gup_test.c b/mm/gup_test.c
> index 8ae7307a1bb6..9ba8ea23f84e 100644
> --- a/mm/gup_test.c
> +++ b/mm/gup_test.c
> @@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd,
> pages + i);
> break;
> case GUP_BASIC_TEST:
> - nr = get_user_pages(addr, nr, gup->gup_flags, pages + i,
> - NULL);
> + nr = get_user_pages(addr, nr, gup->gup_flags, pages + i);
> break;
> case PIN_FAST_BENCHMARK:
> nr = pin_user_pages_fast(addr, nr, gup->gup_flags,
> @@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd,
> pages + i, NULL);
> else
> nr = get_user_pages(addr, nr, gup->gup_flags,
> - pages + i, NULL);
> + pages + i);
> break;
> default:
> ret = -EINVAL;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d255964ec331..7f31e0a4adb5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2474,7 +2474,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
> {
> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>
> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> + rc = get_user_pages(addr, 1, flags, NULL);
> return rc == -EHWPOISON;
> }
>

2023-04-19 12:07:27

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH v4 3/6] mm/gup: remove vmas parameter from get_user_pages_remote()

On 4/18/23 17:49, Lorenzo Stoakes wrote:
> The only instances of get_user_pages_remote() invocations which used the
> vmas parameter were for a single page which can instead simply look up the
> VMA directly. In particular:-
>
> - __update_ref_ctr() looked up the VMA but did nothing with it so we simply
> remove it.
>
> - __access_remote_vm() was already using vma_lookup() when the original
> lookup failed so by doing the lookup directly this also de-duplicates the
> code.
>
> We are able to perform these VMA operations as we already hold the
> mmap_lock in order to be able to call get_user_pages_remote().
>
> As part of this work we add get_user_page_vma_remote() which abstracts the
> VMA lookup, error handling and decrementing the page reference count should
> the VMA lookup fail.
>
> This forms part of a broader set of patches intended to eliminate the vmas
> parameter altogether.
>
> Reviewed-by: Catalin Marinas <[email protected]> (for arm64)
> Acked-by: David Hildenbrand <[email protected]>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---

For the s390 part:
Reviewed-by: Janosch Frank <[email protected]>

2023-04-19 16:44:28

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
> We are shortly to remove pin_user_pages(), and instead perform the required
> VMA checks ourselves. In most cases there will be a single VMA so this
> should caues no undue impact on an already slow path.
>
> Doing this eliminates the one instance of vmas being used by
> pin_user_pages().

First up, please don't just send single patches from a series. It's
really annoying when you are trying to get the full picture. Just CC the
whole series, so reviews don't have to look it up separately.

So when you're doing a respin for what I'll mention below and the issue
that David found, please don't just show us patch 4+5 of the series.

> diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> index 7a43aed8e395..3a927df9d913 100644
> --- a/io_uring/rsrc.c
> +++ b/io_uring/rsrc.c
> @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
> return ret;
> }
>
> +static int check_vmas_locked(unsigned long addr, unsigned long len)
> +{
> + struct file *file;
> + VMA_ITERATOR(vmi, current->mm, addr);
> + struct vm_area_struct *vma = vma_next(&vmi);
> + unsigned long end = addr + len;
> +
> + if (WARN_ON_ONCE(!vma))
> + return -EINVAL;
> +
> + file = vma->vm_file;
> + if (file && !is_file_hugepages(file))
> + return -EOPNOTSUPP;
> +
> + /* don't support file backed memory */
> + for_each_vma_range(vmi, vma, end) {
> + if (vma->vm_file != file)
> + return -EINVAL;
> +
> + if (file && !vma_is_shmem(vma))
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}

I really dislike this naming. There's no point to doing locked in the
naming here, it just makes people think it's checking whether the vmas
are locked. Which is not at all what it does. Because what else would we
think, there's nothing else in the name that suggests what it is
actually checking.

Don't put implied locking in the naming, the way to do that is to do
something ala:

lockdep_assert_held_read(&current->mm->mmap_lock);

though I don't think it's needed here at all, as there's just one caller
and it's clearly inside. You could even just make a comment instead.

So please rename this to indicate what it's ACTUALLY checking.

--
Jens Axboe

2023-04-19 17:00:44

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 10:35?AM, Jens Axboe wrote:
> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
>> We are shortly to remove pin_user_pages(), and instead perform the required
>> VMA checks ourselves. In most cases there will be a single VMA so this
>> should caues no undue impact on an already slow path.
>>
>> Doing this eliminates the one instance of vmas being used by
>> pin_user_pages().
>
> First up, please don't just send single patches from a series. It's
> really annoying when you are trying to get the full picture. Just CC the
> whole series, so reviews don't have to look it up separately.
>
> So when you're doing a respin for what I'll mention below and the issue
> that David found, please don't just show us patch 4+5 of the series.

I'll reply here too rather than keep some of this conversaion
out-of-band.

I don't necessarily think that making io buffer registration dumber and
less efficient by needing a separate vma lookup after the fact is a huge
deal, as I would imagine most workloads register buffers at setup time
and then don't change them. But if people do switch sets at runtime,
it's not necessarily a slow path. That said, I suspect the other bits
that we do in here, like the GUP, is going to dominate the overhead
anyway.

My main question is, why don't we just have a __pin_user_pages or
something helper that still takes the vmas argument, and drop it from
pin_user_pages() only? That'd still allow the cleanup of the other users
that don't care about the vma at all, while retaining the bundled
functionality for the case/cases that do? That would avoid needing
explicit vma iteration in io_uring.

--
Jens Axboe

2023-04-19 17:13:29

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 10:35:12AM -0600, Jens Axboe wrote:
> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
> > We are shortly to remove pin_user_pages(), and instead perform the required
> > VMA checks ourselves. In most cases there will be a single VMA so this
> > should caues no undue impact on an already slow path.
> >
> > Doing this eliminates the one instance of vmas being used by
> > pin_user_pages().
>
> First up, please don't just send single patches from a series. It's
> really annoying when you are trying to get the full picture. Just CC the
> whole series, so reviews don't have to look it up separately.
>

Sorry about that, it's hard to strike the right balance between not
spamming people and giving appropriate context, different people have
different opinions about how best to do this, in retrospect would certainly
have been a good idea to include you on all.

> So when you're doing a respin for what I'll mention below and the issue
> that David found, please don't just show us patch 4+5 of the series.

ack

>
> > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> > index 7a43aed8e395..3a927df9d913 100644
> > --- a/io_uring/rsrc.c
> > +++ b/io_uring/rsrc.c
> > @@ -1138,12 +1138,37 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
> > return ret;
> > }
> >
> > +static int check_vmas_locked(unsigned long addr, unsigned long len)
> > +{
> > + struct file *file;
> > + VMA_ITERATOR(vmi, current->mm, addr);
> > + struct vm_area_struct *vma = vma_next(&vmi);
> > + unsigned long end = addr + len;
> > +
> > + if (WARN_ON_ONCE(!vma))
> > + return -EINVAL;
> > +
> > + file = vma->vm_file;
> > + if (file && !is_file_hugepages(file))
> > + return -EOPNOTSUPP;
> > +
> > + /* don't support file backed memory */
> > + for_each_vma_range(vmi, vma, end) {
> > + if (vma->vm_file != file)
> > + return -EINVAL;
> > +
> > + if (file && !vma_is_shmem(vma))
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
>
> I really dislike this naming. There's no point to doing locked in the
> naming here, it just makes people think it's checking whether the vmas
> are locked. Which is not at all what it does. Because what else would we
> think, there's nothing else in the name that suggests what it is
> actually checking.
>
> Don't put implied locking in the naming, the way to do that is to do
> something ala:
>
> lockdep_assert_held_read(&current->mm->mmap_lock);
>
> though I don't think it's needed here at all, as there's just one caller
> and it's clearly inside. You could even just make a comment instead.
>
> So please rename this to indicate what it's ACTUALLY checking.

ack will do!

>
> --
> Jens Axboe
>

2023-04-19 17:24:51

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote:
> On 4/19/23 10:35?AM, Jens Axboe wrote:
> > On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
> >> We are shortly to remove pin_user_pages(), and instead perform the required
> >> VMA checks ourselves. In most cases there will be a single VMA so this
> >> should caues no undue impact on an already slow path.
> >>
> >> Doing this eliminates the one instance of vmas being used by
> >> pin_user_pages().
> >
> > First up, please don't just send single patches from a series. It's
> > really annoying when you are trying to get the full picture. Just CC the
> > whole series, so reviews don't have to look it up separately.
> >
> > So when you're doing a respin for what I'll mention below and the issue
> > that David found, please don't just show us patch 4+5 of the series.
>
> I'll reply here too rather than keep some of this conversaion
> out-of-band.
>
> I don't necessarily think that making io buffer registration dumber and
> less efficient by needing a separate vma lookup after the fact is a huge
> deal, as I would imagine most workloads register buffers at setup time
> and then don't change them. But if people do switch sets at runtime,
> it's not necessarily a slow path. That said, I suspect the other bits
> that we do in here, like the GUP, is going to dominate the overhead
> anyway.

Thanks, and indeed I expect the GUP will dominate.

>
> My main question is, why don't we just have a __pin_user_pages or
> something helper that still takes the vmas argument, and drop it from
> pin_user_pages() only? That'd still allow the cleanup of the other users
> that don't care about the vma at all, while retaining the bundled
> functionality for the case/cases that do? That would avoid needing
> explicit vma iteration in io_uring.
>

The desire here is to completely eliminate vmas as an externally available
parameter from GUP. While we do have a newly introduced helper that returns
a VMA, doing the lookup manually for all other vma cases (which look up a
single page and vma), that is more so a helper that sits outside of GUP.

Having a separate function that still bundled the vmas would essentially
undermine the purpose of the series altogether which is not just to clean
up some NULL's but rather to eliminate vmas as part of the GUP interface
altogether.

The reason for this is that by doing so we simplify the GUP interface,
eliminate a whole class of possible future bugs with people holding onto
pointers to vmas which may dangle and lead the way to future changes in GUP
which might be more impactful, such as trying to find means to use the fast
paths in more areas with an eye to gradual eradication of the use of
mmap_lock.

While we return VMAs, none of this is possible and it also makes the
interface more confusing - without vmas GUP takes flags which define its
behaviour and in most cases returns page objects. The odd rules about what
can and cannot return vmas under what circumstances are not helpful for new
users.

Another point here is that Jason suggested adding a new
FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be
set. This could assert that only shmem/hugetlb file mappings are permitted
which would eliminate the need for you to perform this check at all.

This leads into the larger point that GUP-writing file mappings is
fundamentally broken due to e.g. GUP not honouring write notify so this
check should at least in theory not be necessary.

So it may be the case that should such a flag be added this code will
simply be deleted at a future point :)

> --
> Jens Axboe
>

2023-04-19 17:45:08

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 11:23?AM, Lorenzo Stoakes wrote:
> On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote:
>> On 4/19/23 10:35?AM, Jens Axboe wrote:
>>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
>>>> We are shortly to remove pin_user_pages(), and instead perform the required
>>>> VMA checks ourselves. In most cases there will be a single VMA so this
>>>> should caues no undue impact on an already slow path.
>>>>
>>>> Doing this eliminates the one instance of vmas being used by
>>>> pin_user_pages().
>>>
>>> First up, please don't just send single patches from a series. It's
>>> really annoying when you are trying to get the full picture. Just CC the
>>> whole series, so reviews don't have to look it up separately.
>>>
>>> So when you're doing a respin for what I'll mention below and the issue
>>> that David found, please don't just show us patch 4+5 of the series.
>>
>> I'll reply here too rather than keep some of this conversaion
>> out-of-band.
>>
>> I don't necessarily think that making io buffer registration dumber and
>> less efficient by needing a separate vma lookup after the fact is a huge
>> deal, as I would imagine most workloads register buffers at setup time
>> and then don't change them. But if people do switch sets at runtime,
>> it's not necessarily a slow path. That said, I suspect the other bits
>> that we do in here, like the GUP, is going to dominate the overhead
>> anyway.
>
> Thanks, and indeed I expect the GUP will dominate.

Unless you have a lot of vmas... Point is, it's _probably_ not a
problem, but it might and it's making things worse for no real gain as
far as I can tell outside of some notion of "cleaning up the code".

>> My main question is, why don't we just have a __pin_user_pages or
>> something helper that still takes the vmas argument, and drop it from
>> pin_user_pages() only? That'd still allow the cleanup of the other users
>> that don't care about the vma at all, while retaining the bundled
>> functionality for the case/cases that do? That would avoid needing
>> explicit vma iteration in io_uring.
>>
>
> The desire here is to completely eliminate vmas as an externally available
> parameter from GUP. While we do have a newly introduced helper that returns
> a VMA, doing the lookup manually for all other vma cases (which look up a
> single page and vma), that is more so a helper that sits outside of GUP.
>
> Having a separate function that still bundled the vmas would essentially
> undermine the purpose of the series altogether which is not just to clean
> up some NULL's but rather to eliminate vmas as part of the GUP interface
> altogether.
>
> The reason for this is that by doing so we simplify the GUP interface,
> eliminate a whole class of possible future bugs with people holding onto
> pointers to vmas which may dangle and lead the way to future changes in GUP
> which might be more impactful, such as trying to find means to use the fast
> paths in more areas with an eye to gradual eradication of the use of
> mmap_lock.
>
> While we return VMAs, none of this is possible and it also makes the
> interface more confusing - without vmas GUP takes flags which define its
> behaviour and in most cases returns page objects. The odd rules about what
> can and cannot return vmas under what circumstances are not helpful for new
> users.
>
> Another point here is that Jason suggested adding a new
> FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be
> set. This could assert that only shmem/hugetlb file mappings are permitted
> which would eliminate the need for you to perform this check at all.
>
> This leads into the larger point that GUP-writing file mappings is
> fundamentally broken due to e.g. GUP not honouring write notify so this
> check should at least in theory not be necessary.
>
> So it may be the case that should such a flag be added this code will
> simply be deleted at a future point :)

Why don't we do that first then? There's nothing more permanent than a
temporary workaround/fix. Once it's in there, motivation to get rid of
it for most people is zero because they just never see it. Seems like
that'd be a much saner approach rather than the other way around, and
make this patchset simpler/cleaner too as it'd only be removing code in
all of the callers.

--
Jens Axboe

2023-04-19 17:51:49

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 11:35:58AM -0600, Jens Axboe wrote:
> On 4/19/23 11:23?AM, Lorenzo Stoakes wrote:
> > On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote:
> >> On 4/19/23 10:35?AM, Jens Axboe wrote:
> >>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
> >>>> We are shortly to remove pin_user_pages(), and instead perform the required
> >>>> VMA checks ourselves. In most cases there will be a single VMA so this
> >>>> should caues no undue impact on an already slow path.
> >>>>
> >>>> Doing this eliminates the one instance of vmas being used by
> >>>> pin_user_pages().
> >>>
> >>> First up, please don't just send single patches from a series. It's
> >>> really annoying when you are trying to get the full picture. Just CC the
> >>> whole series, so reviews don't have to look it up separately.
> >>>
> >>> So when you're doing a respin for what I'll mention below and the issue
> >>> that David found, please don't just show us patch 4+5 of the series.
> >>
> >> I'll reply here too rather than keep some of this conversaion
> >> out-of-band.
> >>
> >> I don't necessarily think that making io buffer registration dumber and
> >> less efficient by needing a separate vma lookup after the fact is a huge
> >> deal, as I would imagine most workloads register buffers at setup time
> >> and then don't change them. But if people do switch sets at runtime,
> >> it's not necessarily a slow path. That said, I suspect the other bits
> >> that we do in here, like the GUP, is going to dominate the overhead
> >> anyway.
> >
> > Thanks, and indeed I expect the GUP will dominate.
>
> Unless you have a lot of vmas... Point is, it's _probably_ not a
> problem, but it might and it's making things worse for no real gain as
> far as I can tell outside of some notion of "cleaning up the code".
>
> >> My main question is, why don't we just have a __pin_user_pages or
> >> something helper that still takes the vmas argument, and drop it from
> >> pin_user_pages() only? That'd still allow the cleanup of the other users
> >> that don't care about the vma at all, while retaining the bundled
> >> functionality for the case/cases that do? That would avoid needing
> >> explicit vma iteration in io_uring.
> >>
> >
> > The desire here is to completely eliminate vmas as an externally available
> > parameter from GUP. While we do have a newly introduced helper that returns
> > a VMA, doing the lookup manually for all other vma cases (which look up a
> > single page and vma), that is more so a helper that sits outside of GUP.
> >
> > Having a separate function that still bundled the vmas would essentially
> > undermine the purpose of the series altogether which is not just to clean
> > up some NULL's but rather to eliminate vmas as part of the GUP interface
> > altogether.
> >
> > The reason for this is that by doing so we simplify the GUP interface,
> > eliminate a whole class of possible future bugs with people holding onto
> > pointers to vmas which may dangle and lead the way to future changes in GUP
> > which might be more impactful, such as trying to find means to use the fast
> > paths in more areas with an eye to gradual eradication of the use of
> > mmap_lock.
> >
> > While we return VMAs, none of this is possible and it also makes the
> > interface more confusing - without vmas GUP takes flags which define its
> > behaviour and in most cases returns page objects. The odd rules about what
> > can and cannot return vmas under what circumstances are not helpful for new
> > users.
> >
> > Another point here is that Jason suggested adding a new
> > FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be
> > set. This could assert that only shmem/hugetlb file mappings are permitted
> > which would eliminate the need for you to perform this check at all.
> >
> > This leads into the larger point that GUP-writing file mappings is
> > fundamentally broken due to e.g. GUP not honouring write notify so this
> > check should at least in theory not be necessary.
> >
> > So it may be the case that should such a flag be added this code will
> > simply be deleted at a future point :)
>
> Why don't we do that first then? There's nothing more permanent than a
> temporary workaround/fix. Once it's in there, motivation to get rid of
> it for most people is zero because they just never see it. Seems like
> that'd be a much saner approach rather than the other way around, and
> make this patchset simpler/cleaner too as it'd only be removing code in
> all of the callers.
>

Because I'd then need to audit all GUP callers to see whether they in some
way brokenly access files in order to know which should and should not use
this new flag. It'd change this series from 'remove the vmas parameter' to
something a lot more involved.

I think it's much safer to do the two separately, as I feel that change
would need quite a bit of scrutiny too.

As for temporary, I can assure you I will be looking at introducing this
flag, for what it's worth :) and Jason is certainly minded to do work in
this area also.

> --
> Jens Axboe
>

2023-04-19 17:54:42

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 11:47?AM, Lorenzo Stoakes wrote:
> On Wed, Apr 19, 2023 at 11:35:58AM -0600, Jens Axboe wrote:
>> On 4/19/23 11:23?AM, Lorenzo Stoakes wrote:
>>> On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote:
>>>> On 4/19/23 10:35?AM, Jens Axboe wrote:
>>>>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
>>>>>> We are shortly to remove pin_user_pages(), and instead perform the required
>>>>>> VMA checks ourselves. In most cases there will be a single VMA so this
>>>>>> should caues no undue impact on an already slow path.
>>>>>>
>>>>>> Doing this eliminates the one instance of vmas being used by
>>>>>> pin_user_pages().
>>>>>
>>>>> First up, please don't just send single patches from a series. It's
>>>>> really annoying when you are trying to get the full picture. Just CC the
>>>>> whole series, so reviews don't have to look it up separately.
>>>>>
>>>>> So when you're doing a respin for what I'll mention below and the issue
>>>>> that David found, please don't just show us patch 4+5 of the series.
>>>>
>>>> I'll reply here too rather than keep some of this conversaion
>>>> out-of-band.
>>>>
>>>> I don't necessarily think that making io buffer registration dumber and
>>>> less efficient by needing a separate vma lookup after the fact is a huge
>>>> deal, as I would imagine most workloads register buffers at setup time
>>>> and then don't change them. But if people do switch sets at runtime,
>>>> it's not necessarily a slow path. That said, I suspect the other bits
>>>> that we do in here, like the GUP, is going to dominate the overhead
>>>> anyway.
>>>
>>> Thanks, and indeed I expect the GUP will dominate.
>>
>> Unless you have a lot of vmas... Point is, it's _probably_ not a
>> problem, but it might and it's making things worse for no real gain as
>> far as I can tell outside of some notion of "cleaning up the code".
>>
>>>> My main question is, why don't we just have a __pin_user_pages or
>>>> something helper that still takes the vmas argument, and drop it from
>>>> pin_user_pages() only? That'd still allow the cleanup of the other users
>>>> that don't care about the vma at all, while retaining the bundled
>>>> functionality for the case/cases that do? That would avoid needing
>>>> explicit vma iteration in io_uring.
>>>>
>>>
>>> The desire here is to completely eliminate vmas as an externally available
>>> parameter from GUP. While we do have a newly introduced helper that returns
>>> a VMA, doing the lookup manually for all other vma cases (which look up a
>>> single page and vma), that is more so a helper that sits outside of GUP.
>>>
>>> Having a separate function that still bundled the vmas would essentially
>>> undermine the purpose of the series altogether which is not just to clean
>>> up some NULL's but rather to eliminate vmas as part of the GUP interface
>>> altogether.
>>>
>>> The reason for this is that by doing so we simplify the GUP interface,
>>> eliminate a whole class of possible future bugs with people holding onto
>>> pointers to vmas which may dangle and lead the way to future changes in GUP
>>> which might be more impactful, such as trying to find means to use the fast
>>> paths in more areas with an eye to gradual eradication of the use of
>>> mmap_lock.
>>>
>>> While we return VMAs, none of this is possible and it also makes the
>>> interface more confusing - without vmas GUP takes flags which define its
>>> behaviour and in most cases returns page objects. The odd rules about what
>>> can and cannot return vmas under what circumstances are not helpful for new
>>> users.
>>>
>>> Another point here is that Jason suggested adding a new
>>> FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be
>>> set. This could assert that only shmem/hugetlb file mappings are permitted
>>> which would eliminate the need for you to perform this check at all.
>>>
>>> This leads into the larger point that GUP-writing file mappings is
>>> fundamentally broken due to e.g. GUP not honouring write notify so this
>>> check should at least in theory not be necessary.
>>>
>>> So it may be the case that should such a flag be added this code will
>>> simply be deleted at a future point :)
>>
>> Why don't we do that first then? There's nothing more permanent than a
>> temporary workaround/fix. Once it's in there, motivation to get rid of
>> it for most people is zero because they just never see it. Seems like
>> that'd be a much saner approach rather than the other way around, and
>> make this patchset simpler/cleaner too as it'd only be removing code in
>> all of the callers.
>>
>
> Because I'd then need to audit all GUP callers to see whether they in some
> way brokenly access files in order to know which should and should not use
> this new flag. It'd change this series from 'remove the vmas parameter' to
> something a lot more involved.
>
> I think it's much safer to do the two separately, as I feel that change
> would need quite a bit of scrutiny too.
>
> As for temporary, I can assure you I will be looking at introducing this
> flag, for what it's worth :) and Jason is certainly minded to do work in
> this area also.

It's either feasible or it's not, and it didn't sound too bad in terms
of getting it done to remove the temporary addition. Since we're now
days away from the merge window and any of this would need to soak in
for-next anyway for a bit, why not just do that other series first? It
really is backward. And this happens sometimes when developing
patchsets, at some point you realize that things would be easier/cleaner
with another prep series first. Nothing wrong with that, but let's not
be hesitant to shift direction a bit when it makes sense to do so.

I keep getting this sense of urgency for a cleanup series. Why not just
do it right from the get-go and make this series simpler? At that point
there would be no discussion on it at all, as it would be a straight
forward cleanup without adding an intermediate step that'd get deleted
later anyway.

--
Jens Axboe

2023-04-19 18:20:26

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 11:51:29AM -0600, Jens Axboe wrote:
> On 4/19/23 11:47?AM, Lorenzo Stoakes wrote:
> > On Wed, Apr 19, 2023 at 11:35:58AM -0600, Jens Axboe wrote:
> >> On 4/19/23 11:23?AM, Lorenzo Stoakes wrote:
> >>> On Wed, Apr 19, 2023 at 10:59:27AM -0600, Jens Axboe wrote:
> >>>> On 4/19/23 10:35?AM, Jens Axboe wrote:
> >>>>> On 4/18/23 9:49?AM, Lorenzo Stoakes wrote:
> >>>>>> We are shortly to remove pin_user_pages(), and instead perform the required
> >>>>>> VMA checks ourselves. In most cases there will be a single VMA so this
> >>>>>> should caues no undue impact on an already slow path.
> >>>>>>
> >>>>>> Doing this eliminates the one instance of vmas being used by
> >>>>>> pin_user_pages().
> >>>>>
> >>>>> First up, please don't just send single patches from a series. It's
> >>>>> really annoying when you are trying to get the full picture. Just CC the
> >>>>> whole series, so reviews don't have to look it up separately.
> >>>>>
> >>>>> So when you're doing a respin for what I'll mention below and the issue
> >>>>> that David found, please don't just show us patch 4+5 of the series.
> >>>>
> >>>> I'll reply here too rather than keep some of this conversaion
> >>>> out-of-band.
> >>>>
> >>>> I don't necessarily think that making io buffer registration dumber and
> >>>> less efficient by needing a separate vma lookup after the fact is a huge
> >>>> deal, as I would imagine most workloads register buffers at setup time
> >>>> and then don't change them. But if people do switch sets at runtime,
> >>>> it's not necessarily a slow path. That said, I suspect the other bits
> >>>> that we do in here, like the GUP, is going to dominate the overhead
> >>>> anyway.
> >>>
> >>> Thanks, and indeed I expect the GUP will dominate.
> >>
> >> Unless you have a lot of vmas... Point is, it's _probably_ not a
> >> problem, but it might and it's making things worse for no real gain as
> >> far as I can tell outside of some notion of "cleaning up the code".
> >>
> >>>> My main question is, why don't we just have a __pin_user_pages or
> >>>> something helper that still takes the vmas argument, and drop it from
> >>>> pin_user_pages() only? That'd still allow the cleanup of the other users
> >>>> that don't care about the vma at all, while retaining the bundled
> >>>> functionality for the case/cases that do? That would avoid needing
> >>>> explicit vma iteration in io_uring.
> >>>>
> >>>
> >>> The desire here is to completely eliminate vmas as an externally available
> >>> parameter from GUP. While we do have a newly introduced helper that returns
> >>> a VMA, doing the lookup manually for all other vma cases (which look up a
> >>> single page and vma), that is more so a helper that sits outside of GUP.
> >>>
> >>> Having a separate function that still bundled the vmas would essentially
> >>> undermine the purpose of the series altogether which is not just to clean
> >>> up some NULL's but rather to eliminate vmas as part of the GUP interface
> >>> altogether.
> >>>
> >>> The reason for this is that by doing so we simplify the GUP interface,
> >>> eliminate a whole class of possible future bugs with people holding onto
> >>> pointers to vmas which may dangle and lead the way to future changes in GUP
> >>> which might be more impactful, such as trying to find means to use the fast
> >>> paths in more areas with an eye to gradual eradication of the use of
> >>> mmap_lock.
> >>>
> >>> While we return VMAs, none of this is possible and it also makes the
> >>> interface more confusing - without vmas GUP takes flags which define its
> >>> behaviour and in most cases returns page objects. The odd rules about what
> >>> can and cannot return vmas under what circumstances are not helpful for new
> >>> users.
> >>>
> >>> Another point here is that Jason suggested adding a new
> >>> FOLL_ALLOW_BROKEN_FILE_MAPPINGS flag which would, by default, not be
> >>> set. This could assert that only shmem/hugetlb file mappings are permitted
> >>> which would eliminate the need for you to perform this check at all.
> >>>
> >>> This leads into the larger point that GUP-writing file mappings is
> >>> fundamentally broken due to e.g. GUP not honouring write notify so this
> >>> check should at least in theory not be necessary.
> >>>
> >>> So it may be the case that should such a flag be added this code will
> >>> simply be deleted at a future point :)
> >>
> >> Why don't we do that first then? There's nothing more permanent than a
> >> temporary workaround/fix. Once it's in there, motivation to get rid of
> >> it for most people is zero because they just never see it. Seems like
> >> that'd be a much saner approach rather than the other way around, and
> >> make this patchset simpler/cleaner too as it'd only be removing code in
> >> all of the callers.
> >>
> >
> > Because I'd then need to audit all GUP callers to see whether they in some
> > way brokenly access files in order to know which should and should not use
> > this new flag. It'd change this series from 'remove the vmas parameter' to
> > something a lot more involved.
> >
> > I think it's much safer to do the two separately, as I feel that change
> > would need quite a bit of scrutiny too.
> >
> > As for temporary, I can assure you I will be looking at introducing this
> > flag, for what it's worth :) and Jason is certainly minded to do work in
> > this area also.
>
> It's either feasible or it's not, and it didn't sound too bad in terms
> of getting it done to remove the temporary addition. Since we're now
> days away from the merge window and any of this would need to soak in
> for-next anyway for a bit, why not just do that other series first? It
> really is backward. And this happens sometimes when developing
> patchsets, at some point you realize that things would be easier/cleaner
> with another prep series first. Nothing wrong with that, but let's not
> be hesitant to shift direction a bit when it makes sense to do so.
>
> I keep getting this sense of urgency for a cleanup series. Why not just
> do it right from the get-go and make this series simpler? At that point
> there would be no discussion on it at all, as it would be a straight
> forward cleanup without adding an intermediate step that'd get deleted
> later anyway.

As I said, I think it is a little more than a cleanup, or at the very least
a cleanup that leads the way to more functional changes (and eliminates a
class of bugs).

I'd also argue that we are doing things right with this patch series as-is,
io_uring is the only sticking point because, believe it or not, it is the
only place in the kernel that uses multiple vmas (it has been interesting
to get a view on GUP use as a whole here).

But obviously if you have concerns about performance I understand (note
that the actual first iteration of this patch set added a flag specifically
to avoid the need for this in order to cause you less trouble :)

I would argue that you're _already_ manipulating VMAs (I do understand your
desire not to do so obviously) the only thing that would change is how you
get them and duplicated work (though likely less impactful).

So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
would still need to come along and delete a bunch of your code
afterwards. And unfortunately Pavel's recent change which insists on not
having different vm_file's across VMAs for the buffer would have to be
reverted so I expect it might not be entirely without discussion.

However, if you really do feel that you can't accept this change as-is, I
can put this series on hold and look at FOLL_ALLOW_BROKEN_FILE_MAPPING and
we can return to this afterwards.

>
> --
> Jens Axboe
>

2023-04-19 18:26:18

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:

> I'd also argue that we are doing things right with this patch series as-is,
> io_uring is the only sticking point because, believe it or not, it is the
> only place in the kernel that uses multiple vmas (it has been interesting
> to get a view on GUP use as a whole here).

I would say io_uring is the only place trying to open-code bug fixes
for MM problems :\ As Jens says, these sorts of temporary work arounds become
lingering problems that nobody wants to fix properly.

> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> would still need to come along and delete a bunch of your code
> afterwards. And unfortunately Pavel's recent change which insists on not
> having different vm_file's across VMAs for the buffer would have to be
> reverted so I expect it might not be entirely without discussion.

Yeah, that should just be reverted.

> However, if you really do feel that you can't accept this change as-is, I
> can put this series on hold and look at FOLL_ALLOW_BROKEN_FILE_MAPPING and
> we can return to this afterwards.

It is probably not as bad as you think, I suspect only RDMA really
wants to set the flag. Maybe something in media too, maybe.

Then again that stuff doesn't work so incredibly badly maybe there
really is no user of it and we should just block it completely.

Jason

2023-04-19 18:28:53

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
> > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> > would still need to come along and delete a bunch of your code
> > afterwards. And unfortunately Pavel's recent change which insists on not
> > having different vm_file's across VMAs for the buffer would have to be
> > reverted so I expect it might not be entirely without discussion.
>
> I don't even understand why Pavel wanted to make this change. The
> commit log really doesn't say.
>
> commit edd478269640
> Author: Pavel Begunkov <[email protected]>
> Date: Wed Feb 22 14:36:48 2023 +0000
>
> io_uring/rsrc: disallow multi-source reg buffers
>
> If two or more mappings go back to back to each other they can be passed
> into io_uring to be registered as a single registered buffer. That would
> even work if mappings came from different sources, e.g. it's possible to
> mix in this way anon pages and pages from shmem or hugetlb. That is not
> a problem but it'd rather be less prone if we forbid such mixing.
>
> Cc: <[email protected]>
> Signed-off-by: Pavel Begunkov <[email protected]>
> Signed-off-by: Jens Axboe <[email protected]>
>
> It even says "That is not a problem"! So why was this patch merged
> if it's not fixing a problem?
>
> It's now standing in the way of an actual cleanup. So why don't we
> revert it? There must be more to it than this ...

https://lore.kernel.org/all/[email protected]/

Jason

2023-04-19 18:29:25

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> would still need to come along and delete a bunch of your code
> afterwards. And unfortunately Pavel's recent change which insists on not
> having different vm_file's across VMAs for the buffer would have to be
> reverted so I expect it might not be entirely without discussion.

I don't even understand why Pavel wanted to make this change. The
commit log really doesn't say.

commit edd478269640
Author: Pavel Begunkov <[email protected]>
Date: Wed Feb 22 14:36:48 2023 +0000

io_uring/rsrc: disallow multi-source reg buffers

If two or more mappings go back to back to each other they can be passed
into io_uring to be registered as a single registered buffer. That would
even work if mappings came from different sources, e.g. it's possible to
mix in this way anon pages and pages from shmem or hugetlb. That is not
a problem but it'd rather be less prone if we forbid such mixing.

Cc: <[email protected]>
Signed-off-by: Pavel Begunkov <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>

It even says "That is not a problem"! So why was this patch merged
if it's not fixing a problem?

It's now standing in the way of an actual cleanup. So why don't we
revert it? There must be more to it than this ...

2023-04-19 18:48:54

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
> > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
> > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> > > would still need to come along and delete a bunch of your code
> > > afterwards. And unfortunately Pavel's recent change which insists on not
> > > having different vm_file's across VMAs for the buffer would have to be
> > > reverted so I expect it might not be entirely without discussion.
> >
> > I don't even understand why Pavel wanted to make this change. The
> > commit log really doesn't say.
> >
> > commit edd478269640
> > Author: Pavel Begunkov <[email protected]>
> > Date: Wed Feb 22 14:36:48 2023 +0000
> >
> > io_uring/rsrc: disallow multi-source reg buffers
> >
> > If two or more mappings go back to back to each other they can be passed
> > into io_uring to be registered as a single registered buffer. That would
> > even work if mappings came from different sources, e.g. it's possible to
> > mix in this way anon pages and pages from shmem or hugetlb. That is not
> > a problem but it'd rather be less prone if we forbid such mixing.
> >
> > Cc: <[email protected]>
> > Signed-off-by: Pavel Begunkov <[email protected]>
> > Signed-off-by: Jens Axboe <[email protected]>
> >
> > It even says "That is not a problem"! So why was this patch merged
> > if it's not fixing a problem?
> >
> > It's now standing in the way of an actual cleanup. So why don't we
> > revert it? There must be more to it than this ...
>
> https://lore.kernel.org/all/[email protected]/

So um, it's disallowed because Pavel couldn't understand why it
should be allowed? This gets less and less convincing.

FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA
flag, which would use our shiny new VMA lock infrastructure to look
up and lock _one_ VMA instead of having the caller take the mmap_lock.
Passing that flag would be a tighter restriction that Pavel implemented,
but would certainly relieve some of his mental load.

By the way, even if all pages are from the same VMA, they may still be a
mixture of anon and file pages; think a MAP_PRIVATE of a file when
only some pages have been written to. Or an anon MAP_SHARED which is
accessible by a child process.

2023-04-19 18:56:27

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 07:35:33PM +0100, Matthew Wilcox wrote:
> On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote:
> > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
> > > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
> > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> > > > would still need to come along and delete a bunch of your code
> > > > afterwards. And unfortunately Pavel's recent change which insists on not
> > > > having different vm_file's across VMAs for the buffer would have to be
> > > > reverted so I expect it might not be entirely without discussion.
> > >
> > > I don't even understand why Pavel wanted to make this change. The
> > > commit log really doesn't say.
> > >
> > > commit edd478269640
> > > Author: Pavel Begunkov <[email protected]>
> > > Date: Wed Feb 22 14:36:48 2023 +0000
> > >
> > > io_uring/rsrc: disallow multi-source reg buffers
> > >
> > > If two or more mappings go back to back to each other they can be passed
> > > into io_uring to be registered as a single registered buffer. That would
> > > even work if mappings came from different sources, e.g. it's possible to
> > > mix in this way anon pages and pages from shmem or hugetlb. That is not
> > > a problem but it'd rather be less prone if we forbid such mixing.
> > >
> > > Cc: <[email protected]>
> > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > Signed-off-by: Jens Axboe <[email protected]>
> > >
> > > It even says "That is not a problem"! So why was this patch merged
> > > if it's not fixing a problem?
> > >
> > > It's now standing in the way of an actual cleanup. So why don't we
> > > revert it? There must be more to it than this ...
> >
> > https://lore.kernel.org/all/[email protected]/
>
> So um, it's disallowed because Pavel couldn't understand why it
> should be allowed? This gets less and less convincing.
>
> FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA
> flag, which would use our shiny new VMA lock infrastructure to look
> up and lock _one_ VMA instead of having the caller take the mmap_lock.
> Passing that flag would be a tighter restriction that Pavel implemented,
> but would certainly relieve some of his mental load.
>
> By the way, even if all pages are from the same VMA, they may still be a
> mixture of anon and file pages; think a MAP_PRIVATE of a file when
> only some pages have been written to. Or an anon MAP_SHARED which is
> accessible by a child process.

Indeed, my motive for the series came out of a conversation with you about
vmas being odd (thanks! :), however I did end up feeling FOLL_SINGLE_VMA
would be too restricted and would break the uAPI.

For example, imagine if a user (yes it'd be weird) mlock'd some pages in a
buffer and not others, then we'd break their use case. Also (perhaps?) more
feasibly, a user might mix hugetlb and anon pages. So I think that'd be too
restrictive here.

However the idea of just essentially taking what Jens has had to do
open-coded and putting it into GUP as a whole really feels like the right
thing to do.

I do like the idea of a FOLL_SINGLE_VMA for other use cases though, the
majority of which want one and one page only. Perhaps worth taking the
helper added in this series (get_user_page_vma_remote() from [1]) and
replacing it with an a full GUP function which has an interface explicitly
for this common single page/vma case.

[1]:https://lore.kernel.org/linux-mm/7c6f1ae88320bf11d2f583178a3d9e653e06ac63.1681831798.git.lstoakes@gmail.com/

2023-04-19 19:05:18

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 03:22:19PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
>
> > I'd also argue that we are doing things right with this patch series as-is,
> > io_uring is the only sticking point because, believe it or not, it is the
> > only place in the kernel that uses multiple vmas (it has been interesting
> > to get a view on GUP use as a whole here).
>
> I would say io_uring is the only place trying to open-code bug fixes
> for MM problems :\ As Jens says, these sorts of temporary work arounds become
> lingering problems that nobody wants to fix properly.
>
> > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> > would still need to come along and delete a bunch of your code
> > afterwards. And unfortunately Pavel's recent change which insists on not
> > having different vm_file's across VMAs for the buffer would have to be
> > reverted so I expect it might not be entirely without discussion.
>
> Yeah, that should just be reverted.
>
> > However, if you really do feel that you can't accept this change as-is, I
> > can put this series on hold and look at FOLL_ALLOW_BROKEN_FILE_MAPPING and
> > we can return to this afterwards.
>
> It is probably not as bad as you think, I suspect only RDMA really
> wants to set the flag. Maybe something in media too, maybe.
>
> Then again that stuff doesn't work so incredibly badly maybe there
> really is no user of it and we should just block it completely.
>
> Jason

OK in this case I think we're all agreed that it's best to do this first
then revisit this series afterwards. I will switch to working on this!

And I will make sure to cc- everyone in :)

2023-04-19 20:19:26

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 12:24?PM, Jason Gunthorpe wrote:
> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
>>> would still need to come along and delete a bunch of your code
>>> afterwards. And unfortunately Pavel's recent change which insists on not
>>> having different vm_file's across VMAs for the buffer would have to be
>>> reverted so I expect it might not be entirely without discussion.
>>
>> I don't even understand why Pavel wanted to make this change. The
>> commit log really doesn't say.
>>
>> commit edd478269640
>> Author: Pavel Begunkov <[email protected]>
>> Date: Wed Feb 22 14:36:48 2023 +0000
>>
>> io_uring/rsrc: disallow multi-source reg buffers
>>
>> If two or more mappings go back to back to each other they can be passed
>> into io_uring to be registered as a single registered buffer. That would
>> even work if mappings came from different sources, e.g. it's possible to
>> mix in this way anon pages and pages from shmem or hugetlb. That is not
>> a problem but it'd rather be less prone if we forbid such mixing.
>>
>> Cc: <[email protected]>
>> Signed-off-by: Pavel Begunkov <[email protected]>
>> Signed-off-by: Jens Axboe <[email protected]>
>>
>> It even says "That is not a problem"! So why was this patch merged
>> if it's not fixing a problem?
>>
>> It's now standing in the way of an actual cleanup. So why don't we
>> revert it? There must be more to it than this ...
>
> https://lore.kernel.org/all/[email protected]/

Let's just kill that patch that, I can add a revert for 6.4. I had
forgotten about that patch and guess I didn't realize that most of the
issue do in fact just stem from that.

--
Jens Axboe

2023-04-19 20:24:28

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 2:15 PM, Jens Axboe wrote:
> On 4/19/23 12:24?PM, Jason Gunthorpe wrote:
>> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
>>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
>>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
>>>> would still need to come along and delete a bunch of your code
>>>> afterwards. And unfortunately Pavel's recent change which insists on not
>>>> having different vm_file's across VMAs for the buffer would have to be
>>>> reverted so I expect it might not be entirely without discussion.
>>>
>>> I don't even understand why Pavel wanted to make this change. The
>>> commit log really doesn't say.
>>>
>>> commit edd478269640
>>> Author: Pavel Begunkov <[email protected]>
>>> Date: Wed Feb 22 14:36:48 2023 +0000
>>>
>>> io_uring/rsrc: disallow multi-source reg buffers
>>>
>>> If two or more mappings go back to back to each other they can be passed
>>> into io_uring to be registered as a single registered buffer. That would
>>> even work if mappings came from different sources, e.g. it's possible to
>>> mix in this way anon pages and pages from shmem or hugetlb. That is not
>>> a problem but it'd rather be less prone if we forbid such mixing.
>>>
>>> Cc: <[email protected]>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>>
>>> It even says "That is not a problem"! So why was this patch merged
>>> if it's not fixing a problem?
>>>
>>> It's now standing in the way of an actual cleanup. So why don't we
>>> revert it? There must be more to it than this ...
>>
>> https://lore.kernel.org/all/[email protected]/
>
> Let's just kill that patch that, I can add a revert for 6.4. I had
> forgotten about that patch and guess I didn't realize that most of the
> issue do in fact just stem from that.

https://git.kernel.dk/cgit/linux-block/commit/?h=for-6.4/io_uring&id=fbd3aaf37886d3645b1bd6920f6298f5884049f8

--
Jens Axboe


2023-04-19 23:38:41

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 07:45:06PM +0100, Lorenzo Stoakes wrote:

> For example, imagine if a user (yes it'd be weird) mlock'd some pages in a
> buffer and not others, then we'd break their use case. Also (perhaps?) more
> feasibly, a user might mix hugetlb and anon pages. So I think that'd be too
> restrictive here.

Yeah, I agree we should not add a broad single-vma restriction to
GUP. It turns any split of a VMA into a potentially uABI breaking
change and we just don't need that headache in the mm..

> I do like the idea of a FOLL_SINGLE_VMA for other use cases though, the
> majority of which want one and one page only. Perhaps worth taking the
> helper added in this series (get_user_page_vma_remote() from [1]) and
> replacing it with an a full GUP function which has an interface explicitly
> for this common single page/vma case.

Like I showed in another thread a function signature that can only do
one page and also returns the VMA would force it to be used properly
and we don't need a FOLL flag.

Jason

2023-04-20 13:45:04

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 19:35, Matthew Wilcox wrote:
> On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote:
>> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
>>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
>>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
>>>> would still need to come along and delete a bunch of your code
>>>> afterwards. And unfortunately Pavel's recent change which insists on not
>>>> having different vm_file's across VMAs for the buffer would have to be
>>>> reverted so I expect it might not be entirely without discussion.
>>>
>>> I don't even understand why Pavel wanted to make this change. The
>>> commit log really doesn't say.
>>>
>>> commit edd478269640
>>> Author: Pavel Begunkov <[email protected]>
>>> Date: Wed Feb 22 14:36:48 2023 +0000
>>>
>>> io_uring/rsrc: disallow multi-source reg buffers
>>>
>>> If two or more mappings go back to back to each other they can be passed
>>> into io_uring to be registered as a single registered buffer. That would
>>> even work if mappings came from different sources, e.g. it's possible to
>>> mix in this way anon pages and pages from shmem or hugetlb. That is not
>>> a problem but it'd rather be less prone if we forbid such mixing.
>>>
>>> Cc: <[email protected]>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>>
>>> It even says "That is not a problem"! So why was this patch merged
>>> if it's not fixing a problem?
>>>
>>> It's now standing in the way of an actual cleanup. So why don't we
>>> revert it? There must be more to it than this ...
>>
>> https://lore.kernel.org/all/[email protected]/
>
> So um, it's disallowed because Pavel couldn't understand why it
> should be allowed? This gets less and less convincing.

Excuse me? I'm really sorry you "couldn't understand" the explanation
as it has probably been too much of a "mental load", but let me try to
elaborate.

Because it's currently limited what can be registered, it's indeed not
a big deal, but that will most certainly change, and I usually and
apparently nonsensically prefer to tighten things up _before_ it becomes
a problem. And again, taking a random set of buffers created for
different purposes and registering it as a single entity is IMHO not a
sane approach.

Take p2pdma for instance, if would have been passed without intermixing
there might not have been is_pci_p2pdma_page()/etc. for every single page
in a bvec. That's why in general, it won't change for p2pdma but there
might be other cases in the future.


> FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA
> flag, which would use our shiny new VMA lock infrastructure to look
> up and lock _one_ VMA instead of having the caller take the mmap_lock.
> Passing that flag would be a tighter restriction that Pavel implemented,
> but would certainly relieve some of his mental load.
>
> By the way, even if all pages are from the same VMA, they may still be a
> mixture of anon and file pages; think a MAP_PRIVATE of a file when
> only some pages have been written to. Or an anon MAP_SHARED which is
> accessible by a child process.

--
Pavel Begunkov

2023-04-20 13:51:29

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On 4/19/23 21:15, Jens Axboe wrote:
> On 4/19/23 12:24?PM, Jason Gunthorpe wrote:
>> On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
>>> On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
>>>> So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
>>>> would still need to come along and delete a bunch of your code
>>>> afterwards. And unfortunately Pavel's recent change which insists on not
>>>> having different vm_file's across VMAs for the buffer would have to be
>>>> reverted so I expect it might not be entirely without discussion.
>>>
>>> I don't even understand why Pavel wanted to make this change. The
>>> commit log really doesn't say.
>>>
>>> commit edd478269640
>>> Author: Pavel Begunkov <[email protected]>
>>> Date: Wed Feb 22 14:36:48 2023 +0000
>>>
>>> io_uring/rsrc: disallow multi-source reg buffers
>>>
>>> If two or more mappings go back to back to each other they can be passed
>>> into io_uring to be registered as a single registered buffer. That would
>>> even work if mappings came from different sources, e.g. it's possible to
>>> mix in this way anon pages and pages from shmem or hugetlb. That is not
>>> a problem but it'd rather be less prone if we forbid such mixing.
>>>
>>> Cc: <[email protected]>
>>> Signed-off-by: Pavel Begunkov <[email protected]>
>>> Signed-off-by: Jens Axboe <[email protected]>
>>>
>>> It even says "That is not a problem"! So why was this patch merged
>>> if it's not fixing a problem?
>>>
>>> It's now standing in the way of an actual cleanup. So why don't we
>>> revert it? There must be more to it than this ...
>>
>> https://lore.kernel.org/all/[email protected]/
>
> Let's just kill that patch that, I can add a revert for 6.4. I had
> forgotten about that patch and guess I didn't realize that most of the
> issue do in fact just stem from that.

Well, we're now trading uapi sanity for cleanups, I don't know
what I should take out of it.

--
Pavel Begunkov

2023-04-20 14:07:07

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Wed, Apr 19, 2023 at 08:22:51PM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 19, 2023 at 07:45:06PM +0100, Lorenzo Stoakes wrote:
>
> > For example, imagine if a user (yes it'd be weird) mlock'd some pages in a
> > buffer and not others, then we'd break their use case. Also (perhaps?) more
> > feasibly, a user might mix hugetlb and anon pages. So I think that'd be too
> > restrictive here.
>
> Yeah, I agree we should not add a broad single-vma restriction to
> GUP. It turns any split of a VMA into a potentially uABI breaking
> change and we just don't need that headache in the mm..
>
> > I do like the idea of a FOLL_SINGLE_VMA for other use cases though, the
> > majority of which want one and one page only. Perhaps worth taking the
> > helper added in this series (get_user_page_vma_remote() from [1]) and
> > replacing it with an a full GUP function which has an interface explicitly
> > for this common single page/vma case.
>
> Like I showed in another thread a function signature that can only do
> one page and also returns the VMA would force it to be used properly
> and we don't need a FOLL flag.
>

Indeed the latest spin of the series uses this. The point is by doing so we
can use per-VMA locks for a common case, I was thinking perhaps as a
separate function call (or perhaps just reusing the wrapper).

This would be entirely separate to all the other work.

> Jason

2023-04-20 14:21:52

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Thu, Apr 20, 2023 at 02:36:47PM +0100, Pavel Begunkov wrote:
> On 4/19/23 19:35, Matthew Wilcox wrote:
> > On Wed, Apr 19, 2023 at 03:24:55PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Apr 19, 2023 at 07:23:00PM +0100, Matthew Wilcox wrote:
> > > > On Wed, Apr 19, 2023 at 07:18:26PM +0100, Lorenzo Stoakes wrote:
> > > > > So even if I did the FOLL_ALLOW_BROKEN_FILE_MAPPING patch series first, I
> > > > > would still need to come along and delete a bunch of your code
> > > > > afterwards. And unfortunately Pavel's recent change which insists on not
> > > > > having different vm_file's across VMAs for the buffer would have to be
> > > > > reverted so I expect it might not be entirely without discussion.
> > > >
> > > > I don't even understand why Pavel wanted to make this change. The
> > > > commit log really doesn't say.
> > > >
> > > > commit edd478269640
> > > > Author: Pavel Begunkov <[email protected]>
> > > > Date: Wed Feb 22 14:36:48 2023 +0000
> > > >
> > > > io_uring/rsrc: disallow multi-source reg buffers
> > > >
> > > > If two or more mappings go back to back to each other they can be passed
> > > > into io_uring to be registered as a single registered buffer. That would
> > > > even work if mappings came from different sources, e.g. it's possible to
> > > > mix in this way anon pages and pages from shmem or hugetlb. That is not
> > > > a problem but it'd rather be less prone if we forbid such mixing.
> > > >
> > > > Cc: <[email protected]>
> > > > Signed-off-by: Pavel Begunkov <[email protected]>
> > > > Signed-off-by: Jens Axboe <[email protected]>
> > > >
> > > > It even says "That is not a problem"! So why was this patch merged
> > > > if it's not fixing a problem?
> > > >
> > > > It's now standing in the way of an actual cleanup. So why don't we
> > > > revert it? There must be more to it than this ...
> > >
> > > https://lore.kernel.org/all/[email protected]/
> >
> > So um, it's disallowed because Pavel couldn't understand why it
> > should be allowed? This gets less and less convincing.
>
> Excuse me? I'm really sorry you "couldn't understand" the explanation
> as it has probably been too much of a "mental load", but let me try to
> elaborate.
>
> Because it's currently limited what can be registered, it's indeed not
> a big deal, but that will most certainly change, and I usually and
> apparently nonsensically prefer to tighten things up _before_ it becomes
> a problem. And again, taking a random set of buffers created for
> different purposes and registering it as a single entity is IMHO not a
> sane approach.
>
> Take p2pdma for instance, if would have been passed without intermixing
> there might not have been is_pci_p2pdma_page()/etc. for every single page
> in a bvec. That's why in general, it won't change for p2pdma but there
> might be other cases in the future.
>

The proposed changes for GUP are equally intended to tighten things up both
in advance of issues (e.g. misuse of vmas parameter), for the purposes of
future improvements to GUP (optimising how we perform these operations) and
most pertinently here - ensuring broken uses of GUP do not occur.

So both are 'cleanups' in some sense :) I think it's important to point out
that this change is not simply a desire to improve an interface but has
practical implications going forward (which maybe aren't obvious at this
point yet).

The new approach to this changes goes further in that we essentially
perform the existing check here (anon/shmem/hugetlb) but for _all_
FOLL_WRITE operations in GUP to avoid broken writes to file mappings, at
which point we can just remove the check here altogether (I will post a
series for this soon).

I think that you guys should not have to be performing sanity checks here
but rather we should handle it in GUP so you don't need to think about it
at all. It feels like an mm failing that you have had to do so at all.

So I guess the question is, do you feel the requirement for vm_file being
the same should apply across _any_ GUP operation over a range or is it
io_uring-specific?

If the former then we should do it in GUP alongside the other checks (and
you can comment accordingly on that patch series when it comes), if the
latter then I would restate my opinion that we shouldn't be prevented from
making improvements to GUP in for one caller that wants to iterate
over these VMAs for custom checks + that should be done separately.

>
> > FWIW, what I was suggesting was that we should have a FOLL_SINGLE_VMA
> > flag, which would use our shiny new VMA lock infrastructure to look
> > up and lock _one_ VMA instead of having the caller take the mmap_lock.
> > Passing that flag would be a tighter restriction that Pavel implemented,
> > but would certainly relieve some of his mental load.
> >
> > By the way, even if all pages are from the same VMA, they may still be a
> > mixture of anon and file pages; think a MAP_PRIVATE of a file when
> > only some pages have been written to. Or an anon MAP_SHARED which is
> > accessible by a child process.
>
> --
> Pavel Begunkov

2023-04-20 15:37:33

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 4/6] io_uring: rsrc: avoid use of vmas parameter in pin_user_pages()

On Thu, Apr 20, 2023 at 03:19:01PM +0100, Lorenzo Stoakes wrote:

> So I guess the question is, do you feel the requirement for vm_file being
> the same should apply across _any_ GUP operation over a range or is it
> io_uring-specific?

No need at all anywhere.

GUP's API contract is you get a jumble of pages with certain
properties. Ie normal GUP gives you a jumble of CPU cache coherent
struct pages.

The job of the GUP caller is to process this jumble.

There is no architectural reason to restrict it beyond that, and
io_uring has no functional need either.

The caller of GUP is, by design, not supposed to know about files, or
VMAs, or other MM details. GUP's purpose is to abstract that.

I would not object if io_uring demanded that all struct pages be in a
certain way, like all are P2P or something. But that is a *struct
page* based test, not a VMA test.

Checking VMAs is basically io_uring saying it knows better than GUP
and better than the MM on how this abstraction should work.

Jason

2023-04-23 15:24:06

by Jarkko Sakkinen

[permalink] [raw]
Subject: Re: [PATCH v4 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

On Tue Apr 18, 2023 at 6:49 PM EEST, Lorenzo Stoakes wrote:
> No invocation of get_user_pages() uses the vmas parameter, so remove
> it.
>
> The GUP API is confusing and caveated. Recent changes have done much to
> improve that, however there is more we can do. Exporting vmas is a prime
> target as the caller has to be extremely careful to preclude their use
> after the mmap_lock has expired or otherwise be left with dangling
> pointers.
>
> Removing the vmas parameter focuses the GUP functions upon their primary
> purpose - pinning (and outputting) pages as well as performing the actions
> implied by the input flags.
>
> This is part of a patch series aiming to remove the vmas parameter
> altogether.
>
> Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
> Acked-by: Greg Kroah-Hartman <[email protected]>
> Acked-by: David Hildenbrand <[email protected]>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
> drivers/misc/sgi-gru/grufault.c | 2 +-
> include/linux/mm.h | 3 +--
> mm/gup.c | 9 +++------
> mm/gup_test.c | 5 ++---
> virt/kvm/kvm_main.c | 2 +-
> 7 files changed, 10 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> index 21ca0a831b70..5d390df21440 100644
> --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> @@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
> if (!(vma->vm_flags & VM_MAYEXEC))
> return -EACCES;
>
> - ret = get_user_pages(src, 1, 0, &src_page, NULL);
> + ret = get_user_pages(src, 1, 0, &src_page);
> if (ret < 1)
> return -EFAULT;
>
> diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
> index 1e8e287e113c..0597540f0dde 100644
> --- a/drivers/gpu/drm/radeon/radeon_ttm.c
> +++ b/drivers/gpu/drm/radeon/radeon_ttm.c
> @@ -362,7 +362,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm
> struct page **pages = ttm->pages + pinned;
>
> r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
> - pages, NULL);
> + pages);
> if (r < 0)
> goto release_pages;
>
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index b836936e9747..378cf02a2aa1 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
> #else
> *pageshift = PAGE_SHIFT;
> #endif
> - if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
> + if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0)
> return -EFAULT;
> *paddr = page_to_phys(page);
> put_page(page);
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 37554b08bb28..b14cc4972d0b 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2380,8 +2380,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
> unsigned int gup_flags, struct page **pages,
> struct vm_area_struct **vmas, int *locked);
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas);
> + unsigned int gup_flags, struct page **pages);
> long pin_user_pages(unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> struct vm_area_struct **vmas);
> diff --git a/mm/gup.c b/mm/gup.c
> index 1f72a717232b..7e454d6b157e 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -2251,8 +2251,6 @@ long get_user_pages_remote(struct mm_struct *mm,
> * @pages: array that receives pointers to the pages pinned.
> * Should be at least nr_pages long. Or NULL, if caller
> * only intends to ensure the pages are faulted in.
> - * @vmas: array of pointers to vmas corresponding to each page.
> - * Or NULL if the caller does not require them.
> *
> * This is the same as get_user_pages_remote(), just with a less-flexible
> * calling convention where we assume that the mm being operated on belongs to
> @@ -2260,16 +2258,15 @@ long get_user_pages_remote(struct mm_struct *mm,
> * obviously don't pass FOLL_REMOTE in here.
> */
> long get_user_pages(unsigned long start, unsigned long nr_pages,
> - unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas)
> + unsigned int gup_flags, struct page **pages)
> {
> int locked = 1;
>
> - if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH))
> + if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
> return -EINVAL;
>
> return __get_user_pages_locked(current->mm, start, nr_pages, pages,
> - vmas, &locked, gup_flags);
> + NULL, &locked, gup_flags);
> }
> EXPORT_SYMBOL(get_user_pages);
>
> diff --git a/mm/gup_test.c b/mm/gup_test.c
> index 8ae7307a1bb6..9ba8ea23f84e 100644
> --- a/mm/gup_test.c
> +++ b/mm/gup_test.c
> @@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd,
> pages + i);
> break;
> case GUP_BASIC_TEST:
> - nr = get_user_pages(addr, nr, gup->gup_flags, pages + i,
> - NULL);
> + nr = get_user_pages(addr, nr, gup->gup_flags, pages + i);
> break;
> case PIN_FAST_BENCHMARK:
> nr = pin_user_pages_fast(addr, nr, gup->gup_flags,
> @@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd,
> pages + i, NULL);
> else
> nr = get_user_pages(addr, nr, gup->gup_flags,
> - pages + i, NULL);
> + pages + i);
> break;
> default:
> ret = -EINVAL;
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d255964ec331..7f31e0a4adb5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2474,7 +2474,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
> {
> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>
> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> + rc = get_user_pages(addr, 1, flags, NULL);
> return rc == -EHWPOISON;
> }
>
> --
> 2.40.0


Acked-by: Jarkko Sakkinen <[email protected]>

BR, Jarkko