2023-05-14 21:29:16

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 0/6] remove the vmas parameter from GUP APIs

(pin_/get)_user_pages[_remote]() each provide an optional output parameter
for an array of VMA objects associated with each page in the input range.

These provide the means for VMAs to be returned, as long as mm->mmap_lock
is never released during the GUP operation (i.e. the internal flag
FOLL_UNLOCKABLE is not specified).

In addition, these VMAs can only be accessed with the mmap_lock held and
become invalidated the moment it is released.

The vast majority of invocations do not use this functionality and of those
that do, all but one case retrieve a single VMA to perform checks upon.

It is not egregious in the single VMA cases to simply replace the operation
with a vma_lookup(). In these cases we duplicate the (fast) lookup on a
slow path already under the mmap_lock, abstracted to a new
get_user_page_vma_remote() inline helper function which also performs error
checking and reference count maintenance.

The special case is io_uring, where io_pin_pages() specifically needs to
assert that the VMAs underlying the range do not result in broken long-term
GUP file-backed mappings.

As GUP now internally asserts that FOLL_LONGTERM mappings are not
file-backed in a broken fashion (i.e. requiring dirty tracking) - as
implemented in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast writing to
file-backed mappings" - this logic is no longer required and so we can
simply remove it altogether from io_uring.

Eliminating the vmas parameter eliminates an entire class of danging
pointer errors that might have occured should the lock have been
incorrectly released.

In addition, the API is simplified and now clearly expresses what it is
intended for - applying the specified GUP flags and (if pinning) returning
pinned pages.

This change additionally opens the door to further potential improvements
in GUP and the possible marrying of disparate code paths.

I have run this series against gup_test with no issues.

This patch series is rebased on mm-unstable as of 12th May.

Thanks to Matthew Wilcox for suggesting this refactoring!

v5:
- Remove the io_uring open-coded VMA file-backed check, as this is now
explicitly disallowed by GUP.
- Updated the subsequent patch to eliminate the vmas parameter accordingly.

v4:
- Drop FOLL_SAME_FILE as the complexity costs exceed the benefit of having it
for a single case.
- Update io_pin_pages() to perform VMA lookup directly.
- Add get_user_page_vma_remote() to perform the single page/VMA lookup with
error checks performed correctly.
https://lore.kernel.org/linux-mm/[email protected]/

v3:
- Always explicitly handle !vma cases, feeding back an error to the user if
appropriate, indicating the operation did not completely succeed if not
and always with a warning since these conditions should be impossible.
https://lore.kernel.org/linux-mm/[email protected]/

v2:
- Only lookup the VMA if the pin succeeded (other than __access_remote_vm()
which has different semantics)
- Be pedantically careful about ensuring that under no circumstances can we
fail to unpin a page
https://lore.kernel.org/linux-mm/[email protected]/

v1:
https://lore.kernel.org/linux-mm/[email protected]/

Lorenzo Stoakes (6):
mm/gup: remove unused vmas parameter from get_user_pages()
mm/gup: remove unused vmas parameter from pin_user_pages_remote()
mm/gup: remove vmas parameter from get_user_pages_remote()
io_uring: rsrc: delegate VMA file-backed check to GUP
mm/gup: remove vmas parameter from pin_user_pages()
mm/gup: remove vmas array from internal GUP functions

arch/arm64/kernel/mte.c | 17 ++--
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
arch/s390/kvm/interrupt.c | 2 +-
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +-
drivers/infiniband/sw/siw/siw_mem.c | 2 +-
drivers/iommu/iommufd/pages.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
fs/exec.c | 2 +-
include/linux/hugetlb.h | 10 +-
include/linux/mm.h | 42 +++++++--
io_uring/rsrc.c | 34 ++-----
kernel/events/uprobes.c | 13 +--
mm/gup.c | 105 +++++++--------------
mm/gup_test.c | 14 ++-
mm/hugetlb.c | 24 ++---
mm/memory.c | 14 +--
mm/process_vm_access.c | 2 +-
mm/rmap.c | 2 +-
net/xdp/xdp_umem.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 3 +-
virt/kvm/kvm_main.c | 2 +-
29 files changed, 138 insertions(+), 178 deletions(-)

--
2.40.1


2023-05-14 21:30:34

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 3/6] mm/gup: remove vmas parameter from get_user_pages_remote()

The only instances of get_user_pages_remote() invocations which used the
vmas parameter were for a single page which can instead simply look up the
VMA directly. In particular:-

- __update_ref_ctr() looked up the VMA but did nothing with it so we simply
remove it.

- __access_remote_vm() was already using vma_lookup() when the original
lookup failed so by doing the lookup directly this also de-duplicates the
code.

We are able to perform these VMA operations as we already hold the
mmap_lock in order to be able to call get_user_pages_remote().

As part of this work we add get_user_page_vma_remote() which abstracts the
VMA lookup, error handling and decrementing the page reference count should
the VMA lookup fail.

This forms part of a broader set of patches intended to eliminate the vmas
parameter altogether.

Reviewed-by: Catalin Marinas <[email protected]> (for arm64)
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Janosch Frank <[email protected]> (for s390)
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/arm64/kernel/mte.c | 17 +++++++++--------
arch/s390/kvm/interrupt.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 34 +++++++++++++++++++++++++++++++---
kernel/events/uprobes.c | 13 +++++--------
mm/gup.c | 12 ++++--------
mm/memory.c | 14 +++++++-------
mm/rmap.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 3 +--
10 files changed, 61 insertions(+), 40 deletions(-)

diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..cc793c246653 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -419,10 +419,9 @@ long get_mte_ctrl(struct task_struct *task)
static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
struct iovec *kiov, unsigned int gup_flags)
{
- struct vm_area_struct *vma;
void __user *buf = kiov->iov_base;
size_t len = kiov->iov_len;
- int ret;
+ int err = 0;
int write = gup_flags & FOLL_WRITE;

if (!access_ok(buf, len))
@@ -432,14 +431,16 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
return -EIO;

while (len) {
+ struct vm_area_struct *vma;
unsigned long tags, offset;
void *maddr;
- struct page *page = NULL;
+ struct page *page = get_user_page_vma_remote(mm, addr,
+ gup_flags, &vma);

- ret = get_user_pages_remote(mm, addr, 1, gup_flags, &page,
- &vma, NULL);
- if (ret <= 0)
+ if (IS_ERR_OR_NULL(page)) {
+ err = page == NULL ? -EIO : PTR_ERR(page);
break;
+ }

/*
* Only copy tags if the page has been mapped as PROT_MTE
@@ -449,7 +450,7 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
* was never mapped with PROT_MTE.
*/
if (!(vma->vm_flags & VM_MTE)) {
- ret = -EOPNOTSUPP;
+ err = -EOPNOTSUPP;
put_page(page);
break;
}
@@ -482,7 +483,7 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
kiov->iov_len = buf - kiov->iov_base;
if (!kiov->iov_len) {
/* check for error accessing the tracee's address space */
- if (ret <= 0)
+ if (err)
return -EIO;
else
return -EFAULT;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index da6dac36e959..9bd0a873f3b1 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2777,7 +2777,7 @@ static struct page *get_map_page(struct kvm *kvm, u64 uaddr)

mmap_read_lock(kvm->mm);
get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
- &page, NULL, NULL);
+ &page, NULL);
mmap_read_unlock(kvm->mm);
return page;
}
diff --git a/fs/exec.c b/fs/exec.c
index a466e797c8e2..25c65b64544b 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -220,7 +220,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
- &page, NULL, NULL);
+ &page, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8ea82e9e7719..679b41ef7a6d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2366,6 +2366,9 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,
unmap_mapping_range(mapping, holebegin, holelen, 0);
}

+static inline struct vm_area_struct *vma_lookup(struct mm_struct *mm,
+ unsigned long addr);
+
extern int access_process_vm(struct task_struct *tsk, unsigned long addr,
void *buf, int len, unsigned int gup_flags);
extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
@@ -2374,13 +2377,38 @@ extern int __access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, unsigned int gup_flags);

long get_user_pages_remote(struct mm_struct *mm,
- unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ unsigned long start, unsigned long nr_pages,
+ unsigned int gup_flags, struct page **pages,
+ int *locked);
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
int *locked);
+
+static inline struct page *get_user_page_vma_remote(struct mm_struct *mm,
+ unsigned long addr,
+ int gup_flags,
+ struct vm_area_struct **vmap)
+{
+ struct page *page;
+ struct vm_area_struct *vma;
+ int got = get_user_pages_remote(mm, addr, 1, gup_flags, &page, NULL);
+
+ if (got < 0)
+ return ERR_PTR(got);
+ if (got == 0)
+ return NULL;
+
+ vma = vma_lookup(mm, addr);
+ if (WARN_ON_ONCE(!vma)) {
+ put_page(page);
+ return ERR_PTR(-EINVAL);
+ }
+
+ *vmap = vma;
+ return page;
+}
+
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 59887c69d54c..cac3aef7c6f7 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -365,7 +365,6 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
{
void *kaddr;
struct page *page;
- struct vm_area_struct *vma;
int ret;
short *ptr;

@@ -373,7 +372,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
return -EINVAL;

ret = get_user_pages_remote(mm, vaddr, 1,
- FOLL_WRITE, &page, &vma, NULL);
+ FOLL_WRITE, &page, NULL);
if (unlikely(ret <= 0)) {
/*
* We are asking for 1 page. If get_user_pages_remote() fails,
@@ -474,10 +473,9 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
if (is_register)
gup_flags |= FOLL_SPLIT_PMD;
/* Read the page with vaddr into memory */
- ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
- &old_page, &vma, NULL);
- if (ret <= 0)
- return ret;
+ old_page = get_user_page_vma_remote(mm, vaddr, gup_flags, &vma);
+ if (IS_ERR_OR_NULL(old_page))
+ return PTR_ERR(old_page);

ret = verify_opcode(old_page, vaddr, &opcode);
if (ret <= 0)
@@ -2027,8 +2025,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
* but we treat this as a 'remote' access since it is
* essentially a kernel access to the memory.
*/
- result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page,
- NULL, NULL);
+ result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
if (result < 0)
return result;

diff --git a/mm/gup.c b/mm/gup.c
index ce78a5186dbb..1493cc8dd526 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2208,8 +2208,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: pointer to lock flag indicating whether lock is held and
* subsequently whether VM_FAULT_RETRY functionality can be
* utilised. Lock must initially be held.
@@ -2224,8 +2222,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
*
* The caller is responsible for releasing returned @pages, via put_page().
*
- * @vmas are valid only as long as mmap_lock is held.
- *
* Must be called with mmap_lock held for read or write.
*
* get_user_pages_remote walks a process's page tables and takes a reference
@@ -2262,15 +2258,15 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, vmas, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
FOLL_TOUCH | FOLL_REMOTE))
return -EINVAL;

- return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
locked ? locked : &local_locked,
gup_flags);
}
@@ -2280,7 +2276,7 @@ EXPORT_SYMBOL(get_user_pages_remote);
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 146bb94764f8..63632a5eafc1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5590,7 +5590,6 @@ EXPORT_SYMBOL_GPL(generic_access_phys);
int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
int len, unsigned int gup_flags)
{
- struct vm_area_struct *vma;
void *old_buf = buf;
int write = gup_flags & FOLL_WRITE;

@@ -5599,13 +5598,15 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,

/* ignore errors, just check how much was successfully transferred */
while (len) {
- int bytes, ret, offset;
+ int bytes, offset;
void *maddr;
- struct page *page = NULL;
+ struct vm_area_struct *vma;
+ struct page *page = get_user_page_vma_remote(mm, addr,
+ gup_flags, &vma);
+
+ if (IS_ERR_OR_NULL(page)) {
+ int ret = 0;

- ret = get_user_pages_remote(mm, addr, 1,
- gup_flags, &page, &vma, NULL);
- if (ret <= 0) {
#ifndef CONFIG_HAVE_IOREMAP_PROT
break;
#else
@@ -5613,7 +5614,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
* Check if this is a VM_IO | VM_PFNMAP VMA, which
* we can access using slightly different code.
*/
- vma = vma_lookup(mm, addr);
if (!vma)
break;
if (vma->vm_ops && vma->vm_ops->access)
diff --git a/mm/rmap.c b/mm/rmap.c
index b42fc0389c24..ae127f60a4fb 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2328,7 +2328,7 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,

npages = get_user_pages_remote(mm, start, npages,
FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
- pages, NULL, NULL);
+ pages, NULL);
if (npages < 0)
return npages;

diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 31af29f669d2..ac20c0bdff9d 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -916,7 +916,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1,
- FOLL_FORCE, &page, NULL, NULL);
+ FOLL_FORCE, &page, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return false;
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 9bfe1d6f6529..e033c79d528e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -61,8 +61,7 @@ static void async_pf_execute(struct work_struct *work)
* access remotely.
*/
mmap_read_lock(mm);
- get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, NULL,
- &locked);
+ get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked);
if (locked)
mmap_read_unlock(mm);

--
2.40.1


2023-05-14 21:30:40

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 2/6] mm/gup: remove unused vmas parameter from pin_user_pages_remote()

No invocation of pin_user_pages_remote() uses the vmas parameter, so remove
it. This forms part of a larger patch set eliminating the use of the vmas
parameters altogether.

Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
drivers/iommu/iommufd/pages.c | 4 ++--
drivers/vfio/vfio_iommu_type1.c | 2 +-
include/linux/mm.h | 2 +-
mm/gup.c | 8 +++-----
mm/process_vm_access.c | 2 +-
5 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index 3c47846cc5ef..412ca96be128 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -786,7 +786,7 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
user->locked = 1;
}
rc = pin_user_pages_remote(pages->source_mm, uptr, npages,
- user->gup_flags, user->upages, NULL,
+ user->gup_flags, user->upages,
&user->locked);
}
if (rc <= 0) {
@@ -1799,7 +1799,7 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
rc = pin_user_pages_remote(
pages->source_mm, (uintptr_t)(pages->uptr + index * PAGE_SIZE),
1, (flags & IOMMUFD_ACCESS_RW_WRITE) ? FOLL_WRITE : 0, &page,
- NULL, NULL);
+ NULL);
mmap_read_unlock(pages->source_mm);
if (rc != 1) {
if (WARN_ON(rc >= 0))
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3d4dd9420c30..3d2d9a944906 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -562,7 +562,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,

mmap_read_lock(mm);
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
- pages, NULL, NULL);
+ pages, NULL);
if (ret > 0) {
int i;

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2c1a92bf5626..8ea82e9e7719 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2380,7 +2380,7 @@ long get_user_pages_remote(struct mm_struct *mm,
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index b8189396f435..ce78a5186dbb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3243,8 +3243,6 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
* @gup_flags: flags modifying lookup behaviour
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: pointer to lock flag indicating whether lock is held and
* subsequently whether VM_FAULT_RETRY functionality can be
* utilised. Lock must initially be held.
@@ -3259,14 +3257,14 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, vmas, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
return 0;
- return __gup_longterm_locked(mm, start, nr_pages, pages, vmas,
+ return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
locked ? locked : &local_locked,
gup_flags);
}
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 78dfaf9e8990..0523edab03a6 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -104,7 +104,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
mmap_read_lock(mm);
pinned_pages = pin_user_pages_remote(mm, pa, pinned_pages,
flags, process_pages,
- NULL, &locked);
+ &locked);
if (locked)
mmap_read_unlock(mm);
if (pinned_pages <= 0)
--
2.40.1


2023-05-14 21:31:03

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 4/6] io_uring: rsrc: delegate VMA file-backed check to GUP

Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
writing to file-backed mappings", there is no need to explicitly check VMAs
for this condition, so simply remove this logic from io_uring altogether.

Signed-off-by: Lorenzo Stoakes <[email protected]>
---
io_uring/rsrc.c | 34 ++++++----------------------------
1 file changed, 6 insertions(+), 28 deletions(-)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index d46f72a5ef73..b6451f8bc5d5 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1030,9 +1030,8 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
- struct vm_area_struct **vmas = NULL;
struct page **pages = NULL;
- int i, pret, ret = -ENOMEM;
+ int pret, ret = -ENOMEM;

end = (ubuf + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
start = ubuf >> PAGE_SHIFT;
@@ -1042,45 +1041,24 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
if (!pages)
goto done;

- vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *),
- GFP_KERNEL);
- if (!vmas)
- goto done;
-
ret = 0;
mmap_read_lock(current->mm);
pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
- pages, vmas);
- if (pret == nr_pages) {
- /* don't support file backed memory */
- for (i = 0; i < nr_pages; i++) {
- struct vm_area_struct *vma = vmas[i];
-
- if (vma_is_shmem(vma))
- continue;
- if (vma->vm_file &&
- !is_file_hugepages(vma->vm_file)) {
- ret = -EOPNOTSUPP;
- break;
- }
- }
+ pages, NULL);
+ if (pret == nr_pages)
*npages = nr_pages;
- } else {
+ else
ret = pret < 0 ? pret : -EFAULT;
- }
+
mmap_read_unlock(current->mm);
if (ret) {
- /*
- * if we did partial map, or found file backed vmas,
- * release any pages we did get
- */
+ /* if we did partial map, release any pages we did get */
if (pret > 0)
unpin_user_pages(pages, pret);
goto done;
}
ret = 0;
done:
- kvfree(vmas);
if (ret < 0) {
kvfree(pages);
pages = ERR_PTR(ret);
--
2.40.1


2023-05-14 21:31:30

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

No invocation of get_user_pages() use the vmas parameter, so remove it.

The GUP API is confusing and caveated. Recent changes have done much to
improve that, however there is more we can do. Exporting vmas is a prime
target as the caller has to be extremely careful to preclude their use
after the mmap_lock has expired or otherwise be left with dangling
pointers.

Removing the vmas parameter focuses the GUP functions upon their primary
purpose - pinning (and outputting) pages as well as performing the actions
implied by the input flags.

This is part of a patch series aiming to remove the vmas parameter
altogether.

Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Acked-by: Christian König <[email protected]> (for radeon parts)
Acked-by: Jarkko Sakkinen <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
include/linux/mm.h | 3 +--
mm/gup.c | 9 +++------
mm/gup_test.c | 5 ++---
virt/kvm/kvm_main.c | 2 +-
7 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 21ca0a831b70..5d390df21440 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
if (!(vma->vm_flags & VM_MAYEXEC))
return -EACCES;

- ret = get_user_pages(src, 1, 0, &src_page, NULL);
+ ret = get_user_pages(src, 1, 0, &src_page);
if (ret < 1)
return -EFAULT;

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 2220cdf6a3f6..3a9db030f98f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -359,7 +359,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm
struct page **pages = ttm->pages + pinned;

r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
- pages, NULL);
+ pages);
if (r < 0)
goto release_pages;

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index b836936e9747..378cf02a2aa1 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
#else
*pageshift = PAGE_SHIFT;
#endif
- if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
+ if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
put_page(page);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index db3f66ed2f32..2c1a92bf5626 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2382,8 +2382,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
diff --git a/mm/gup.c b/mm/gup.c
index 90d9b65ff35c..b8189396f435 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2294,8 +2294,6 @@ long get_user_pages_remote(struct mm_struct *mm,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* This is the same as get_user_pages_remote(), just with a less-flexible
* calling convention where we assume that the mm being operated on belongs to
@@ -2303,16 +2301,15 @@ long get_user_pages_remote(struct mm_struct *mm,
* obviously don't pass FOLL_REMOTE in here.
*/
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;

- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;

return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- vmas, &locked, gup_flags);
+ NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);

diff --git a/mm/gup_test.c b/mm/gup_test.c
index 8ae7307a1bb6..9ba8ea23f84e 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case GUP_BASIC_TEST:
- nr = get_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = get_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_FAST_BENCHMARK:
nr = pin_user_pages_fast(addr, nr, gup->gup_flags,
@@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i, NULL);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
break;
default:
ret = -EINVAL;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index cb5c13eee193..eaa5bb8dbadc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
{
int rc, flags = FOLL_HWPOISON | FOLL_WRITE;

- rc = get_user_pages(addr, 1, flags, NULL, NULL);
+ rc = get_user_pages(addr, 1, flags, NULL);
return rc == -EHWPOISON;
}

--
2.40.1


2023-05-14 21:44:35

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 5/6] mm/gup: remove vmas parameter from pin_user_pages()

We are now in a position where no caller of pin_user_pages() requires the
vmas parameter at all, so eliminate this parameter from the function and
all callers.

This clears the way to removing the vmas parameter from GUP altogether.

Acked-by: David Hildenbrand <[email protected]>
Acked-by: Dennis Dalessandro <[email protected]> (for qib)
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +-
drivers/infiniband/sw/siw/siw_mem.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
include/linux/mm.h | 3 +--
io_uring/rsrc.c | 2 +-
mm/gup.c | 9 +++------
mm/gup_test.c | 9 ++++-----
net/xdp/xdp_umem.c | 2 +-
12 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 81d7185e2ae8..d19fb1f3007d 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -105,7 +105,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,

ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n,
FOLL_WRITE | FOLL_LONGTERM,
- mem->hpages + entry, NULL);
+ mem->hpages + entry);
if (ret == n) {
pinned += n;
continue;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index f693bc753b6b..1bb7507325bc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -111,7 +111,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
ret = pin_user_pages(start_page + got * PAGE_SIZE,
num_pages - got,
FOLL_LONGTERM | FOLL_WRITE,
- p + got, NULL);
+ p + got);
if (ret < 0) {
mmap_read_unlock(current->mm);
goto bail_release;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 2a5cac2658ec..84e0f41e7dfa 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -140,7 +140,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
ret = pin_user_pages(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof(struct page *)),
- gup_flags, page_list, NULL);
+ gup_flags, page_list);

if (ret < 0)
goto out;
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index f51ab2ccf151..e6e25f15567d 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -422,7 +422,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
umem->page_chunk[i].plist = plist;
while (nents) {
rv = pin_user_pages(first_page_va, nents, foll_flags,
- plist, NULL);
+ plist);
if (rv < 0)
goto out_sem_up;

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 53001532e8e3..405b89ea1054 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -180,7 +180,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
data, size, dma->nr_pages);

err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, gup_flags,
- dma->pages, NULL);
+ dma->pages);

if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index de97e38c3b82..4d4405f058e8 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -1052,7 +1052,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
goto out;

pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
- page_list, NULL);
+ page_list);
if (pinned != npages) {
ret = pinned < 0 ? pinned : -ENOMEM;
goto out;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 8c1aefc865f0..61223fcbe82b 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -983,7 +983,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
while (npages) {
sz2pin = min_t(unsigned long, npages, list_size);
pinned = pin_user_pages(cur_base, sz2pin,
- gup_flags, page_list, NULL);
+ gup_flags, page_list);
if (sz2pin != pinned) {
if (pinned < 0) {
ret = pinned;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 679b41ef7a6d..db09c7062965 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2412,8 +2412,7 @@ static inline struct page *get_user_page_vma_remote(struct mm_struct *mm,
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index b6451f8bc5d5..b56bda46a9eb 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1044,7 +1044,7 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
ret = 0;
mmap_read_lock(current->mm);
pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
- pages, NULL);
+ pages);
if (pret == nr_pages)
*npages = nr_pages;
else
diff --git a/mm/gup.c b/mm/gup.c
index 1493cc8dd526..36701b5f0123 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3274,8 +3274,6 @@ EXPORT_SYMBOL(pin_user_pages_remote);
* @gup_flags: flags modifying lookup behaviour
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and
* FOLL_PIN is set.
@@ -3284,15 +3282,14 @@ EXPORT_SYMBOL(pin_user_pages_remote);
* see Documentation/core-api/pin_user_pages.rst for details.
*/
long pin_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;

- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
return 0;
return __gup_longterm_locked(current->mm, start, nr_pages,
- pages, vmas, &locked, gup_flags);
+ pages, NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);

diff --git a/mm/gup_test.c b/mm/gup_test.c
index 9ba8ea23f84e..1668ce0e0783 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -146,18 +146,17 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case PIN_BASIC_TEST:
- nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_LONGTERM_BENCHMARK:
nr = pin_user_pages(addr, nr,
gup->gup_flags | FOLL_LONGTERM,
- pages + i, NULL);
+ pages + i);
break;
case DUMP_USER_PAGES_TEST:
if (gup->test_flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
nr = pin_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
pages + i);
@@ -270,7 +269,7 @@ static inline int pin_longterm_test_start(unsigned long arg)
gup_flags, pages);
else
cur_pages = pin_user_pages(addr, remaining_pages,
- gup_flags, pages, NULL);
+ gup_flags, pages);
if (cur_pages < 0) {
pin_longterm_test_stop();
ret = cur_pages;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 02207e852d79..06cead2b8e34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -103,7 +103,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)

mmap_read_lock(current->mm);
npgs = pin_user_pages(address, umem->npgs,
- gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
+ gup_flags | FOLL_LONGTERM, &umem->pgs[0]);
mmap_read_unlock(current->mm);

if (npgs != umem->npgs) {
--
2.40.1


2023-05-14 21:49:38

by Lorenzo Stoakes

[permalink] [raw]
Subject: [PATCH v5 6/6] mm/gup: remove vmas array from internal GUP functions

Now we have eliminated all callers to GUP APIs which use the vmas
parameter, eliminate it altogether.

This eliminates a class of bugs where vmas might have been kept around
longer than the mmap_lock and thus we need not be concerned about locks
being dropped during this operation leaving behind dangling pointers.

This simplifies the GUP API and makes it considerably clearer as to its
purpose - follow flags are applied and if pinning, an array of pages is
returned.

Acked-by: David Hildenbrand <[email protected]>
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
include/linux/hugetlb.h | 10 ++---
mm/gup.c | 83 +++++++++++++++--------------------------
mm/hugetlb.c | 24 +++++-------
3 files changed, 45 insertions(+), 72 deletions(-)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 6d041aa9f0fe..b2b698f9a2ec 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -133,9 +133,8 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
- struct page **, struct vm_area_struct **,
- unsigned long *, unsigned long *, long, unsigned int,
- int *);
+ struct page **, unsigned long *, unsigned long *,
+ long, unsigned int, int *);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *,
zap_flags_t);
@@ -306,9 +305,8 @@ static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,

static inline long follow_hugetlb_page(struct mm_struct *mm,
struct vm_area_struct *vma, struct page **pages,
- struct vm_area_struct **vmas, unsigned long *position,
- unsigned long *nr_pages, long i, unsigned int flags,
- int *nonblocking)
+ unsigned long *position, unsigned long *nr_pages,
+ long i, unsigned int flags, int *nonblocking)
{
BUG();
return 0;
diff --git a/mm/gup.c b/mm/gup.c
index 36701b5f0123..dbe96d266670 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1067,8 +1067,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: whether we're still with the mmap_lock held
*
* Returns either number of pages pinned (which may be less than the
@@ -1082,8 +1080,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
*
* The caller is responsible for releasing returned @pages, via put_page().
*
- * @vmas are valid only as long as mmap_lock is held.
- *
* Must be called with mmap_lock held. It may be released. See below.
*
* __get_user_pages walks a process's page tables and takes a reference to
@@ -1119,7 +1115,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
@@ -1159,9 +1155,9 @@ static long __get_user_pages(struct mm_struct *mm,
goto out;

if (is_vm_hugetlb_page(vma)) {
- i = follow_hugetlb_page(mm, vma, pages, vmas,
- &start, &nr_pages, i,
- gup_flags, locked);
+ i = follow_hugetlb_page(mm, vma, pages,
+ &start, &nr_pages, i,
+ gup_flags, locked);
if (!*locked) {
/*
* We've got a VM_FAULT_RETRY
@@ -1226,10 +1222,6 @@ static long __get_user_pages(struct mm_struct *mm,
ctx.page_mask = 0;
}
next_page:
- if (vmas) {
- vmas[i] = vma;
- ctx.page_mask = 0;
- }
page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
if (page_increm > nr_pages)
page_increm = nr_pages;
@@ -1384,7 +1376,6 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
- struct vm_area_struct **vmas,
int *locked,
unsigned int flags)
{
@@ -1422,7 +1413,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
pages_done = 0;
for (;;) {
ret = __get_user_pages(mm, start, nr_pages, flags, pages,
- vmas, locked);
+ locked);
if (!(flags & FOLL_UNLOCKABLE)) {
/* VM_FAULT_RETRY couldn't trigger, bypass */
pages_done = ret;
@@ -1486,7 +1477,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,

*locked = 1;
ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
- pages, NULL, locked);
+ pages, locked);
if (!*locked) {
/* Continue to retry until we succeeded */
BUG_ON(ret != 0);
@@ -1584,7 +1575,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* not result in a stack expansion that recurses back here.
*/
ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked ? locked : &local_locked);
+ NULL, locked ? locked : &local_locked);
lru_add_drain();
return ret;
}
@@ -1642,7 +1633,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
return -EINVAL;

ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked);
+ NULL, locked);
lru_add_drain();
return ret;
}
@@ -1710,8 +1701,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
#else /* CONFIG_MMU */
static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
unsigned long nr_pages, struct page **pages,
- struct vm_area_struct **vmas, int *locked,
- unsigned int foll_flags)
+ int *locked, unsigned int foll_flags)
{
struct vm_area_struct *vma;
bool must_unlock = false;
@@ -1755,8 +1745,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
if (pages[i])
get_page(pages[i]);
}
- if (vmas)
- vmas[i] = vma;
+
start = (start + PAGE_SIZE) & PAGE_MASK;
}

@@ -1937,8 +1926,7 @@ struct page *get_dump_page(unsigned long addr)
int locked = 0;
int ret;

- ret = __get_user_pages_locked(current->mm, addr, 1, &page, NULL,
- &locked,
+ ret = __get_user_pages_locked(current->mm, addr, 1, &page, &locked,
FOLL_FORCE | FOLL_DUMP | FOLL_GET);
return (ret == 1) ? page : NULL;
}
@@ -2111,7 +2099,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
- struct vm_area_struct **vmas,
int *locked,
unsigned int gup_flags)
{
@@ -2119,13 +2106,13 @@ static long __gup_longterm_locked(struct mm_struct *mm,
long rc, nr_pinned_pages;

if (!(gup_flags & FOLL_LONGTERM))
- return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages,
locked, gup_flags);

flags = memalloc_pin_save();
do {
nr_pinned_pages = __get_user_pages_locked(mm, start, nr_pages,
- pages, vmas, locked,
+ pages, locked,
gup_flags);
if (nr_pinned_pages <= 0) {
rc = nr_pinned_pages;
@@ -2143,9 +2130,8 @@ static long __gup_longterm_locked(struct mm_struct *mm,
* Check that the given flags are valid for the exported gup/pup interface, and
* update them with the required flags that the caller must have set.
*/
-static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
- int *locked, unsigned int *gup_flags_p,
- unsigned int to_set)
+static bool is_valid_gup_args(struct page **pages, int *locked,
+ unsigned int *gup_flags_p, unsigned int to_set)
{
unsigned int gup_flags = *gup_flags_p;

@@ -2187,13 +2173,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
(gup_flags & FOLL_PCI_P2PDMA)))
return false;

- /*
- * Can't use VMAs with locked, as locked allows GUP to unlock
- * which invalidates the vmas array
- */
- if (WARN_ON_ONCE(vmas && (gup_flags & FOLL_UNLOCKABLE)))
- return false;
-
*gup_flags_p = gup_flags;
return true;
}
@@ -2262,11 +2241,11 @@ long get_user_pages_remote(struct mm_struct *mm,
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, locked, &gup_flags,
FOLL_TOUCH | FOLL_REMOTE))
return -EINVAL;

- return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
+ return __get_user_pages_locked(mm, start, nr_pages, pages,
locked ? locked : &local_locked,
gup_flags);
}
@@ -2301,11 +2280,11 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
{
int locked = 1;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;

return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- NULL, &locked, gup_flags);
+ &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);

@@ -2329,12 +2308,12 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
{
int locked = 0;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_TOUCH | FOLL_UNLOCKABLE))
return -EINVAL;

return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- NULL, &locked, gup_flags);
+ &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages_unlocked);

@@ -3124,7 +3103,7 @@ static int internal_get_user_pages_fast(unsigned long start,
start += nr_pinned << PAGE_SHIFT;
pages += nr_pinned;
ret = __gup_longterm_locked(current->mm, start, nr_pages - nr_pinned,
- pages, NULL, &locked,
+ pages, &locked,
gup_flags | FOLL_TOUCH | FOLL_UNLOCKABLE);
if (ret < 0) {
/*
@@ -3166,7 +3145,7 @@ int get_user_pages_fast_only(unsigned long start, int nr_pages,
* FOLL_FAST_ONLY is required in order to match the API description of
* this routine: no fall back to regular ("slow") GUP.
*/
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_GET | FOLL_FAST_ONLY))
return -EINVAL;

@@ -3199,7 +3178,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
* FOLL_GET, because gup fast is always a "pin with a +1 page refcount"
* request.
*/
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_GET))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_GET))
return -EINVAL;
return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
}
@@ -3224,7 +3203,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
int pin_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages)
{
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
return -EINVAL;
return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
}
@@ -3257,10 +3236,10 @@ long pin_user_pages_remote(struct mm_struct *mm,
{
int local_locked = 1;

- if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, locked, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
return 0;
- return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
+ return __gup_longterm_locked(mm, start, nr_pages, pages,
locked ? locked : &local_locked,
gup_flags);
}
@@ -3286,10 +3265,10 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
{
int locked = 1;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
return 0;
return __gup_longterm_locked(current->mm, start, nr_pages,
- pages, NULL, &locked, gup_flags);
+ pages, &locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);

@@ -3303,11 +3282,11 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
{
int locked = 0;

- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_UNLOCKABLE))
return 0;

- return __gup_longterm_locked(current->mm, start, nr_pages, pages, NULL,
+ return __gup_longterm_locked(current->mm, start, nr_pages, pages,
&locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages_unlocked);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f154019e6b84..ea24718db4af 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6425,17 +6425,14 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
}
#endif /* CONFIG_USERFAULTFD */

-static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
- int refs, struct page **pages,
- struct vm_area_struct **vmas)
+static void record_subpages(struct page *page, struct vm_area_struct *vma,
+ int refs, struct page **pages)
{
int nr;

for (nr = 0; nr < refs; nr++) {
if (likely(pages))
pages[nr] = nth_page(page, nr);
- if (vmas)
- vmas[nr] = vma;
}
}

@@ -6508,9 +6505,9 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
}

long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
- struct page **pages, struct vm_area_struct **vmas,
- unsigned long *position, unsigned long *nr_pages,
- long i, unsigned int flags, int *locked)
+ struct page **pages, unsigned long *position,
+ unsigned long *nr_pages, long i, unsigned int flags,
+ int *locked)
{
unsigned long pfn_offset;
unsigned long vaddr = *position;
@@ -6638,7 +6635,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
* If subpage information not requested, update counters
* and skip the same_page loop below.
*/
- if (!pages && !vmas && !pfn_offset &&
+ if (!pages && !pfn_offset &&
(vaddr + huge_page_size(h) < vma->vm_end) &&
(remainder >= pages_per_huge_page(h))) {
vaddr += huge_page_size(h);
@@ -6653,11 +6650,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
(vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);

- if (pages || vmas)
- record_subpages_vmas(nth_page(page, pfn_offset),
- vma, refs,
- likely(pages) ? pages + i : NULL,
- vmas ? vmas + i : NULL);
+ if (pages)
+ record_subpages(nth_page(page, pfn_offset),
+ vma, refs,
+ likely(pages) ? pages + i : NULL);

if (pages) {
/*
--
2.40.1


2023-05-15 11:56:48

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 2/6] mm/gup: remove unused vmas parameter from pin_user_pages_remote()

On Sun, May 14, 2023 at 10:26:47PM +0100, Lorenzo Stoakes wrote:
> No invocation of pin_user_pages_remote() uses the vmas parameter, so remove
> it. This forms part of a larger patch set eliminating the use of the vmas
> parameters altogether.

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-05-15 12:10:11

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v5 5/6] mm/gup: remove vmas parameter from pin_user_pages()

On Sun, May 14, 2023 at 10:26:58PM +0100, Lorenzo Stoakes wrote:
> We are now in a position where no caller of pin_user_pages() requires the
> vmas parameter at all, so eliminate this parameter from the function and
> all callers.
>
> This clears the way to removing the vmas parameter from GUP altogether.
>
> Acked-by: David Hildenbrand <[email protected]>
> Acked-by: Dennis Dalessandro <[email protected]> (for qib)
> Signed-off-by: Lorenzo Stoakes <[email protected]>

Looks good:

Reviewed-by: Christoph Hellwig <[email protected]>

2023-05-15 19:13:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

On Sun, May 14, 2023, Lorenzo Stoakes wrote:
> No invocation of get_user_pages() use the vmas parameter, so remove it.
>
> The GUP API is confusing and caveated. Recent changes have done much to
> improve that, however there is more we can do. Exporting vmas is a prime
> target as the caller has to be extremely careful to preclude their use
> after the mmap_lock has expired or otherwise be left with dangling
> pointers.
>
> Removing the vmas parameter focuses the GUP functions upon their primary
> purpose - pinning (and outputting) pages as well as performing the actions
> implied by the input flags.
>
> This is part of a patch series aiming to remove the vmas parameter
> altogether.
>
> Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
> Acked-by: Greg Kroah-Hartman <[email protected]>
> Acked-by: David Hildenbrand <[email protected]>
> Reviewed-by: Jason Gunthorpe <[email protected]>
> Acked-by: Christian K�nig <[email protected]> (for radeon parts)
> Acked-by: Jarkko Sakkinen <[email protected]>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
> drivers/misc/sgi-gru/grufault.c | 2 +-
> include/linux/mm.h | 3 +--
> mm/gup.c | 9 +++------
> mm/gup_test.c | 5 ++---
> virt/kvm/kvm_main.c | 2 +-
> 7 files changed, 10 insertions(+), 15 deletions(-)

Acked-by: Sean Christopherson <[email protected]> (KVM)

> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index cb5c13eee193..eaa5bb8dbadc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
> {
> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>
> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> + rc = get_user_pages(addr, 1, flags, NULL);
> return rc == -EHWPOISON;

Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
a valid page, KVM will leak the refcount and unintentionally pin the page. That's
highly unlikely as check_user_page_hwpoison() is called iff get_user_pages_unlocked()
fails (called by hva_to_pfn_slow()), but it's theoretically possible that userspace
could change the VMAs between hva_to_pfn_slow() and check_user_page_hwpoison() since
KVM doesn't hold any relevant locks at this point.

E.g. if there's no VMA during hva_to_pfn_{fast,slow}(), npages==-EFAULT and KVM
will invoke check_user_page_hwpoison(). If userspace installs a valid mapping
after hva_to_pfn_slow() but before KVM acquires mmap_lock, then gup() will find
a valid page.

I _think_ the fix is to simply delete this code. The bug was introduced by commit
fafc3dbaac64 ("KVM: Replace is_hwpoison_address with __get_user_pages"). At that
time, KVM didn't check for "npages == -EHWPOISON" from the first call to
get_user_pages_unlocked(). Later on, commit 0857b9e95c1a ("KVM: Enable async page
fault processing") reworked the caller to be:

mmap_read_lock(current->mm);
if (npages == -EHWPOISON ||
(!async && check_user_page_hwpoison(addr))) {
pfn = KVM_PFN_ERR_HWPOISON;
goto exit;
}

where async really means NOWAIT, so that the hwpoison use of gup() didn't sleep.

KVM: Enable async page fault processing

If asynchronous hva_to_pfn() is requested call GUP with FOLL_NOWAIT to
avoid sleeping on IO. Check for hwpoison is done at the same time,
otherwise check_user_page_hwpoison() will call GUP again and will put
vcpu to sleep.

There are other potential problems too, e.g. the hwpoison call doesn't honor
the recently introduced @interruptible flag.

I don't see any reason to keep check_user_page_hwpoison(), KVM can simply rely on
the "npages == -EHWPOISON" check. get_user_pages_unlocked() is guaranteed to be
called with roughly equivalent flags, and the flags that aren't equivalent are
arguably bugs in check_user_page_hwpoison(), e.g. assuming FOLL_WRITE is wrong.

TL;DR: Go ahead with this change, I'll submit a separate patch to delete the
buggy KVM code.

2023-05-15 19:58:07

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] io_uring: rsrc: delegate VMA file-backed check to GUP

On 5/14/23 3:26 PM, Lorenzo Stoakes wrote:
> Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
> broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
> writing to file-backed mappings", there is no need to explicitly check VMAs
> for this condition, so simply remove this logic from io_uring altogether.

Don't have the prerequisite patch handy (not in mainline yet), but if it
just moves the check, then:

Reviewed-by: Jens Axboe <[email protected]>

--
Jens Axboe



2023-05-15 20:09:48

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v5 6/6] mm/gup: remove vmas array from internal GUP functions

On 5/14/23 14:27, Lorenzo Stoakes wrote:
> Now we have eliminated all callers to GUP APIs which use the vmas
> parameter, eliminate it altogether.
>
> This eliminates a class of bugs where vmas might have been kept around
> longer than the mmap_lock and thus we need not be concerned about locks
> being dropped during this operation leaving behind dangling pointers.
>
> This simplifies the GUP API and makes it considerably clearer as to its
> purpose - follow flags are applied and if pinning, an array of pages is
> returned.
>
> Acked-by: David Hildenbrand <[email protected]>
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> include/linux/hugetlb.h | 10 ++---
> mm/gup.c | 83 +++++++++++++++--------------------------
> mm/hugetlb.c | 24 +++++-------
> 3 files changed, 45 insertions(+), 72 deletions(-)


Very nice to see this historical baggage get removed!

thanks,
--
John Hubbard
NVIDIA

>
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index 6d041aa9f0fe..b2b698f9a2ec 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -133,9 +133,8 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
> struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> unsigned long address, unsigned int flags);
> long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
> - struct page **, struct vm_area_struct **,
> - unsigned long *, unsigned long *, long, unsigned int,
> - int *);
> + struct page **, unsigned long *, unsigned long *,
> + long, unsigned int, int *);
> void unmap_hugepage_range(struct vm_area_struct *,
> unsigned long, unsigned long, struct page *,
> zap_flags_t);
> @@ -306,9 +305,8 @@ static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
>
> static inline long follow_hugetlb_page(struct mm_struct *mm,
> struct vm_area_struct *vma, struct page **pages,
> - struct vm_area_struct **vmas, unsigned long *position,
> - unsigned long *nr_pages, long i, unsigned int flags,
> - int *nonblocking)
> + unsigned long *position, unsigned long *nr_pages,
> + long i, unsigned int flags, int *nonblocking)
> {
> BUG();
> return 0;
> diff --git a/mm/gup.c b/mm/gup.c
> index 36701b5f0123..dbe96d266670 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1067,8 +1067,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> * @pages: array that receives pointers to the pages pinned.
> * Should be at least nr_pages long. Or NULL, if caller
> * only intends to ensure the pages are faulted in.
> - * @vmas: array of pointers to vmas corresponding to each page.
> - * Or NULL if the caller does not require them.
> * @locked: whether we're still with the mmap_lock held
> *
> * Returns either number of pages pinned (which may be less than the
> @@ -1082,8 +1080,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> *
> * The caller is responsible for releasing returned @pages, via put_page().
> *
> - * @vmas are valid only as long as mmap_lock is held.
> - *
> * Must be called with mmap_lock held. It may be released. See below.
> *
> * __get_user_pages walks a process's page tables and takes a reference to
> @@ -1119,7 +1115,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
> static long __get_user_pages(struct mm_struct *mm,
> unsigned long start, unsigned long nr_pages,
> unsigned int gup_flags, struct page **pages,
> - struct vm_area_struct **vmas, int *locked)
> + int *locked)
> {
> long ret = 0, i = 0;
> struct vm_area_struct *vma = NULL;
> @@ -1159,9 +1155,9 @@ static long __get_user_pages(struct mm_struct *mm,
> goto out;
>
> if (is_vm_hugetlb_page(vma)) {
> - i = follow_hugetlb_page(mm, vma, pages, vmas,
> - &start, &nr_pages, i,
> - gup_flags, locked);
> + i = follow_hugetlb_page(mm, vma, pages,
> + &start, &nr_pages, i,
> + gup_flags, locked);
> if (!*locked) {
> /*
> * We've got a VM_FAULT_RETRY
> @@ -1226,10 +1222,6 @@ static long __get_user_pages(struct mm_struct *mm,
> ctx.page_mask = 0;
> }
> next_page:
> - if (vmas) {
> - vmas[i] = vma;
> - ctx.page_mask = 0;
> - }
> page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
> if (page_increm > nr_pages)
> page_increm = nr_pages;
> @@ -1384,7 +1376,6 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
> unsigned long start,
> unsigned long nr_pages,
> struct page **pages,
> - struct vm_area_struct **vmas,
> int *locked,
> unsigned int flags)
> {
> @@ -1422,7 +1413,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
> pages_done = 0;
> for (;;) {
> ret = __get_user_pages(mm, start, nr_pages, flags, pages,
> - vmas, locked);
> + locked);
> if (!(flags & FOLL_UNLOCKABLE)) {
> /* VM_FAULT_RETRY couldn't trigger, bypass */
> pages_done = ret;
> @@ -1486,7 +1477,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
>
> *locked = 1;
> ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
> - pages, NULL, locked);
> + pages, locked);
> if (!*locked) {
> /* Continue to retry until we succeeded */
> BUG_ON(ret != 0);
> @@ -1584,7 +1575,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
> * not result in a stack expansion that recurses back here.
> */
> ret = __get_user_pages(mm, start, nr_pages, gup_flags,
> - NULL, NULL, locked ? locked : &local_locked);
> + NULL, locked ? locked : &local_locked);
> lru_add_drain();
> return ret;
> }
> @@ -1642,7 +1633,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
> return -EINVAL;
>
> ret = __get_user_pages(mm, start, nr_pages, gup_flags,
> - NULL, NULL, locked);
> + NULL, locked);
> lru_add_drain();
> return ret;
> }
> @@ -1710,8 +1701,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
> #else /* CONFIG_MMU */
> static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
> unsigned long nr_pages, struct page **pages,
> - struct vm_area_struct **vmas, int *locked,
> - unsigned int foll_flags)
> + int *locked, unsigned int foll_flags)
> {
> struct vm_area_struct *vma;
> bool must_unlock = false;
> @@ -1755,8 +1745,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
> if (pages[i])
> get_page(pages[i]);
> }
> - if (vmas)
> - vmas[i] = vma;
> +
> start = (start + PAGE_SIZE) & PAGE_MASK;
> }
>
> @@ -1937,8 +1926,7 @@ struct page *get_dump_page(unsigned long addr)
> int locked = 0;
> int ret;
>
> - ret = __get_user_pages_locked(current->mm, addr, 1, &page, NULL,
> - &locked,
> + ret = __get_user_pages_locked(current->mm, addr, 1, &page, &locked,
> FOLL_FORCE | FOLL_DUMP | FOLL_GET);
> return (ret == 1) ? page : NULL;
> }
> @@ -2111,7 +2099,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
> unsigned long start,
> unsigned long nr_pages,
> struct page **pages,
> - struct vm_area_struct **vmas,
> int *locked,
> unsigned int gup_flags)
> {
> @@ -2119,13 +2106,13 @@ static long __gup_longterm_locked(struct mm_struct *mm,
> long rc, nr_pinned_pages;
>
> if (!(gup_flags & FOLL_LONGTERM))
> - return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
> + return __get_user_pages_locked(mm, start, nr_pages, pages,
> locked, gup_flags);
>
> flags = memalloc_pin_save();
> do {
> nr_pinned_pages = __get_user_pages_locked(mm, start, nr_pages,
> - pages, vmas, locked,
> + pages, locked,
> gup_flags);
> if (nr_pinned_pages <= 0) {
> rc = nr_pinned_pages;
> @@ -2143,9 +2130,8 @@ static long __gup_longterm_locked(struct mm_struct *mm,
> * Check that the given flags are valid for the exported gup/pup interface, and
> * update them with the required flags that the caller must have set.
> */
> -static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
> - int *locked, unsigned int *gup_flags_p,
> - unsigned int to_set)
> +static bool is_valid_gup_args(struct page **pages, int *locked,
> + unsigned int *gup_flags_p, unsigned int to_set)
> {
> unsigned int gup_flags = *gup_flags_p;
>
> @@ -2187,13 +2173,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
> (gup_flags & FOLL_PCI_P2PDMA)))
> return false;
>
> - /*
> - * Can't use VMAs with locked, as locked allows GUP to unlock
> - * which invalidates the vmas array
> - */
> - if (WARN_ON_ONCE(vmas && (gup_flags & FOLL_UNLOCKABLE)))
> - return false;
> -
> *gup_flags_p = gup_flags;
> return true;
> }
> @@ -2262,11 +2241,11 @@ long get_user_pages_remote(struct mm_struct *mm,
> {
> int local_locked = 1;
>
> - if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
> + if (!is_valid_gup_args(pages, locked, &gup_flags,
> FOLL_TOUCH | FOLL_REMOTE))
> return -EINVAL;
>
> - return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
> + return __get_user_pages_locked(mm, start, nr_pages, pages,
> locked ? locked : &local_locked,
> gup_flags);
> }
> @@ -2301,11 +2280,11 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
> {
> int locked = 1;
>
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
> + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_TOUCH))
> return -EINVAL;
>
> return __get_user_pages_locked(current->mm, start, nr_pages, pages,
> - NULL, &locked, gup_flags);
> + &locked, gup_flags);
> }
> EXPORT_SYMBOL(get_user_pages);
>
> @@ -2329,12 +2308,12 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> {
> int locked = 0;
>
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
> + if (!is_valid_gup_args(pages, NULL, &gup_flags,
> FOLL_TOUCH | FOLL_UNLOCKABLE))
> return -EINVAL;
>
> return __get_user_pages_locked(current->mm, start, nr_pages, pages,
> - NULL, &locked, gup_flags);
> + &locked, gup_flags);
> }
> EXPORT_SYMBOL(get_user_pages_unlocked);
>
> @@ -3124,7 +3103,7 @@ static int internal_get_user_pages_fast(unsigned long start,
> start += nr_pinned << PAGE_SHIFT;
> pages += nr_pinned;
> ret = __gup_longterm_locked(current->mm, start, nr_pages - nr_pinned,
> - pages, NULL, &locked,
> + pages, &locked,
> gup_flags | FOLL_TOUCH | FOLL_UNLOCKABLE);
> if (ret < 0) {
> /*
> @@ -3166,7 +3145,7 @@ int get_user_pages_fast_only(unsigned long start, int nr_pages,
> * FOLL_FAST_ONLY is required in order to match the API description of
> * this routine: no fall back to regular ("slow") GUP.
> */
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
> + if (!is_valid_gup_args(pages, NULL, &gup_flags,
> FOLL_GET | FOLL_FAST_ONLY))
> return -EINVAL;
>
> @@ -3199,7 +3178,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
> * FOLL_GET, because gup fast is always a "pin with a +1 page refcount"
> * request.
> */
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_GET))
> + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_GET))
> return -EINVAL;
> return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
> }
> @@ -3224,7 +3203,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
> int pin_user_pages_fast(unsigned long start, int nr_pages,
> unsigned int gup_flags, struct page **pages)
> {
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
> + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
> return -EINVAL;
> return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
> }
> @@ -3257,10 +3236,10 @@ long pin_user_pages_remote(struct mm_struct *mm,
> {
> int local_locked = 1;
>
> - if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
> + if (!is_valid_gup_args(pages, locked, &gup_flags,
> FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
> return 0;
> - return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
> + return __gup_longterm_locked(mm, start, nr_pages, pages,
> locked ? locked : &local_locked,
> gup_flags);
> }
> @@ -3286,10 +3265,10 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
> {
> int locked = 1;
>
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
> + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
> return 0;
> return __gup_longterm_locked(current->mm, start, nr_pages,
> - pages, NULL, &locked, gup_flags);
> + pages, &locked, gup_flags);
> }
> EXPORT_SYMBOL(pin_user_pages);
>
> @@ -3303,11 +3282,11 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
> {
> int locked = 0;
>
> - if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
> + if (!is_valid_gup_args(pages, NULL, &gup_flags,
> FOLL_PIN | FOLL_TOUCH | FOLL_UNLOCKABLE))
> return 0;
>
> - return __gup_longterm_locked(current->mm, start, nr_pages, pages, NULL,
> + return __gup_longterm_locked(current->mm, start, nr_pages, pages,
> &locked, gup_flags);
> }
> EXPORT_SYMBOL(pin_user_pages_unlocked);
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f154019e6b84..ea24718db4af 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -6425,17 +6425,14 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
> }
> #endif /* CONFIG_USERFAULTFD */
>
> -static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
> - int refs, struct page **pages,
> - struct vm_area_struct **vmas)
> +static void record_subpages(struct page *page, struct vm_area_struct *vma,
> + int refs, struct page **pages)
> {
> int nr;
>
> for (nr = 0; nr < refs; nr++) {
> if (likely(pages))
> pages[nr] = nth_page(page, nr);
> - if (vmas)
> - vmas[nr] = vma;
> }
> }
>
> @@ -6508,9 +6505,9 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
> }
>
> long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> - struct page **pages, struct vm_area_struct **vmas,
> - unsigned long *position, unsigned long *nr_pages,
> - long i, unsigned int flags, int *locked)
> + struct page **pages, unsigned long *position,
> + unsigned long *nr_pages, long i, unsigned int flags,
> + int *locked)
> {
> unsigned long pfn_offset;
> unsigned long vaddr = *position;
> @@ -6638,7 +6635,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> * If subpage information not requested, update counters
> * and skip the same_page loop below.
> */
> - if (!pages && !vmas && !pfn_offset &&
> + if (!pages && !pfn_offset &&
> (vaddr + huge_page_size(h) < vma->vm_end) &&
> (remainder >= pages_per_huge_page(h))) {
> vaddr += huge_page_size(h);
> @@ -6653,11 +6650,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
> refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
> (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
>
> - if (pages || vmas)
> - record_subpages_vmas(nth_page(page, pfn_offset),
> - vma, refs,
> - likely(pages) ? pages + i : NULL,
> - vmas ? vmas + i : NULL);
> + if (pages)
> + record_subpages(nth_page(page, pfn_offset),
> + vma, refs,
> + likely(pages) ? pages + i : NULL);
>
> if (pages) {
> /*



2023-05-16 08:33:52

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] io_uring: rsrc: delegate VMA file-backed check to GUP

On 14.05.23 23:26, Lorenzo Stoakes wrote:
> Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
> broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
> writing to file-backed mappings", there is no need to explicitly check VMAs
> for this condition, so simply remove this logic from io_uring altogether.
>

Worth adding "Note that this change will make iouring fixed buffers work
on MAP_PRIVATE file mappings."

I'll run my test cases with this series and expect no surprises :)


Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb


2023-05-16 08:35:11

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] io_uring: rsrc: delegate VMA file-backed check to GUP

On 15.05.23 21:55, Jens Axboe wrote:
> On 5/14/23 3:26 PM, Lorenzo Stoakes wrote:
>> Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
>> broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
>> writing to file-backed mappings", there is no need to explicitly check VMAs
>> for this condition, so simply remove this logic from io_uring altogether.
>
> Don't have the prerequisite patch handy (not in mainline yet), but if it
> just moves the check, then:
>
> Reviewed-by: Jens Axboe <[email protected]>
>

Jens, please see my note regarding iouring:

https://lore.kernel.org/bpf/[email protected]/

With this patch, MAP_PRIVATE will work as expected (2), but there will
be a change in return code handling (1) that we might have to document
in the man page.

--
Thanks,

David / dhildenb


2023-05-16 09:55:28

by Anders Roxell

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] mm/gup: remove vmas parameter from get_user_pages_remote()

On 2023-05-14 22:26, Lorenzo Stoakes wrote:
> The only instances of get_user_pages_remote() invocations which used the
> vmas parameter were for a single page which can instead simply look up the
> VMA directly. In particular:-
>
> - __update_ref_ctr() looked up the VMA but did nothing with it so we simply
> remove it.
>
> - __access_remote_vm() was already using vma_lookup() when the original
> lookup failed so by doing the lookup directly this also de-duplicates the
> code.
>
> We are able to perform these VMA operations as we already hold the
> mmap_lock in order to be able to call get_user_pages_remote().
>
> As part of this work we add get_user_page_vma_remote() which abstracts the
> VMA lookup, error handling and decrementing the page reference count should
> the VMA lookup fail.
>
> This forms part of a broader set of patches intended to eliminate the vmas
> parameter altogether.
>
> Reviewed-by: Catalin Marinas <[email protected]> (for arm64)
> Acked-by: David Hildenbrand <[email protected]>
> Reviewed-by: Janosch Frank <[email protected]> (for s390)
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
> arch/arm64/kernel/mte.c | 17 +++++++++--------
> arch/s390/kvm/interrupt.c | 2 +-
> fs/exec.c | 2 +-
> include/linux/mm.h | 34 +++++++++++++++++++++++++++++++---
> kernel/events/uprobes.c | 13 +++++--------
> mm/gup.c | 12 ++++--------
> mm/memory.c | 14 +++++++-------
> mm/rmap.c | 2 +-
> security/tomoyo/domain.c | 2 +-
> virt/kvm/async_pf.c | 3 +--
> 10 files changed, 61 insertions(+), 40 deletions(-)
>

[...]

> diff --git a/mm/memory.c b/mm/memory.c
> index 146bb94764f8..63632a5eafc1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5590,7 +5590,6 @@ EXPORT_SYMBOL_GPL(generic_access_phys);
> int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> int len, unsigned int gup_flags)
> {
> - struct vm_area_struct *vma;
> void *old_buf = buf;
> int write = gup_flags & FOLL_WRITE;
>
> @@ -5599,13 +5598,15 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
>
> /* ignore errors, just check how much was successfully transferred */
> while (len) {
> - int bytes, ret, offset;
> + int bytes, offset;
> void *maddr;
> - struct page *page = NULL;
> + struct vm_area_struct *vma;
> + struct page *page = get_user_page_vma_remote(mm, addr,
> + gup_flags, &vma);
> +
> + if (IS_ERR_OR_NULL(page)) {
> + int ret = 0;

I see the warning below when building without CONFIG_HAVE_IOREMAP_PROT set.

make --silent --keep-going --jobs=32 \
O=/home/anders/.cache/tuxmake/builds/1244/build ARCH=arm \
CROSS_COMPILE=arm-linux-gnueabihf- /home/anders/src/kernel/next/mm/memory.c: In function '__access_remote_vm':
/home/anders/src/kernel/next/mm/memory.c:5608:29: warning: unused variable 'ret' [-Wunused-variable]
5608 | int ret = 0;
| ^~~


>
> - ret = get_user_pages_remote(mm, addr, 1,
> - gup_flags, &page, &vma, NULL);
> - if (ret <= 0) {
> #ifndef CONFIG_HAVE_IOREMAP_PROT
> break;
> #else
> @@ -5613,7 +5614,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> * Check if this is a VM_IO | VM_PFNMAP VMA, which
> * we can access using slightly different code.
> */
> - vma = vma_lookup(mm, addr);
> if (!vma)
> break;
> if (vma->vm_ops && vma->vm_ops->access)

Cheers,
Anders

2023-05-16 10:32:23

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

On 15.05.23 21:07, Sean Christopherson wrote:
> On Sun, May 14, 2023, Lorenzo Stoakes wrote:
>> No invocation of get_user_pages() use the vmas parameter, so remove it.
>>
>> The GUP API is confusing and caveated. Recent changes have done much to
>> improve that, however there is more we can do. Exporting vmas is a prime
>> target as the caller has to be extremely careful to preclude their use
>> after the mmap_lock has expired or otherwise be left with dangling
>> pointers.
>>
>> Removing the vmas parameter focuses the GUP functions upon their primary
>> purpose - pinning (and outputting) pages as well as performing the actions
>> implied by the input flags.
>>
>> This is part of a patch series aiming to remove the vmas parameter
>> altogether.
>>
>> Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
>> Acked-by: Greg Kroah-Hartman <[email protected]>
>> Acked-by: David Hildenbrand <[email protected]>
>> Reviewed-by: Jason Gunthorpe <[email protected]>
>> Acked-by: Christian K�nig <[email protected]> (for radeon parts)
>> Acked-by: Jarkko Sakkinen <[email protected]>
>> Signed-off-by: Lorenzo Stoakes <[email protected]>
>> ---
>> arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
>> drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
>> drivers/misc/sgi-gru/grufault.c | 2 +-
>> include/linux/mm.h | 3 +--
>> mm/gup.c | 9 +++------
>> mm/gup_test.c | 5 ++---
>> virt/kvm/kvm_main.c | 2 +-
>> 7 files changed, 10 insertions(+), 15 deletions(-)
>
> Acked-by: Sean Christopherson <[email protected]> (KVM)
>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index cb5c13eee193..eaa5bb8dbadc 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
>> {
>> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>>
>> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
>> + rc = get_user_pages(addr, 1, flags, NULL);
>> return rc == -EHWPOISON;
>
> Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
> a valid page, KVM will leak the refcount and unintentionally pin the page. That's

When passing NULL as "pages" to get_user_pages(),
__get_user_pages_locked() won't set FOLL_GET. As FOLL_PIN is also not
set, we won't be messing with the mapcount of the page.

So even if get_user_pages() returns "1", we should be fine.


Or am I misunderstanding your concern? At least hva_to_pfn_slow() most
certainly didn't return "1" if we end up calling
check_user_page_hwpoison(), so nothing would have been pinned there as well.

--
Thanks,

David / dhildenb


2023-05-16 13:55:48

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH v5 4/6] io_uring: rsrc: delegate VMA file-backed check to GUP

On 5/16/23 2:25?AM, David Hildenbrand wrote:
> On 15.05.23 21:55, Jens Axboe wrote:
>> On 5/14/23 3:26?PM, Lorenzo Stoakes wrote:
>>> Now that the GUP explicitly checks FOLL_LONGTERM pin_user_pages() for
>>> broken file-backed mappings in "mm/gup: disallow FOLL_LONGTERM GUP-nonfast
>>> writing to file-backed mappings", there is no need to explicitly check VMAs
>>> for this condition, so simply remove this logic from io_uring altogether.
>>
>> Don't have the prerequisite patch handy (not in mainline yet), but if it
>> just moves the check, then:
>>
>> Reviewed-by: Jens Axboe <[email protected]>
>>
>
> Jens, please see my note regarding iouring:
>
> https://lore.kernel.org/bpf/[email protected]/
>
> With this patch, MAP_PRIVATE will work as expected (2), but there will
> be a change in return code handling (1) that we might have to document
> in the man page.

I think documenting that newer kernels will return -EFAULT rather than
-EOPNOTSUPP should be fine. It's not a new failure case, just a
different error value for an already failing case. Should be fine with
just a doc update. Will do that now.

--
Jens Axboe


2023-05-16 15:05:20

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

On 16.05.23 16:30, Sean Christopherson wrote:
> On Tue, May 16, 2023, David Hildenbrand wrote:
>> On 15.05.23 21:07, Sean Christopherson wrote:
>>> On Sun, May 14, 2023, Lorenzo Stoakes wrote:
>>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>>> index cb5c13eee193..eaa5bb8dbadc 100644
>>>> --- a/virt/kvm/kvm_main.c
>>>> +++ b/virt/kvm/kvm_main.c
>>>> @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
>>>> {
>>>> int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
>>>> - rc = get_user_pages(addr, 1, flags, NULL, NULL);
>>>> + rc = get_user_pages(addr, 1, flags, NULL);
>>>> return rc == -EHWPOISON;
>>>
>>> Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
>>> a valid page, KVM will leak the refcount and unintentionally pin the page. That's
>>
>> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked()
>> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with
>> the mapcount of the page.

For completeness: s/mapcount/refcount/ :)

>
> Ah, that's what I'm missing.



--
Thanks,

David / dhildenb


2023-05-16 15:05:44

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

On Tue, May 16, 2023, David Hildenbrand wrote:
> On 15.05.23 21:07, Sean Christopherson wrote:
> > On Sun, May 14, 2023, Lorenzo Stoakes wrote:
> > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > > index cb5c13eee193..eaa5bb8dbadc 100644
> > > --- a/virt/kvm/kvm_main.c
> > > +++ b/virt/kvm/kvm_main.c
> > > @@ -2477,7 +2477,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
> > > {
> > > int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
> > > - rc = get_user_pages(addr, 1, flags, NULL, NULL);
> > > + rc = get_user_pages(addr, 1, flags, NULL);
> > > return rc == -EHWPOISON;
> >
> > Unrelated to this patch, I think there's a pre-existing bug here. If gup() returns
> > a valid page, KVM will leak the refcount and unintentionally pin the page. That's
>
> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked()
> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with
> the mapcount of the page.

Ah, that's what I'm missing.

> So even if get_user_pages() returns "1", we should be fine.
>
>
> Or am I misunderstanding your concern?

Nope, you covered everything. I do think we can drop the extra gup() though,
AFAICT it's 100% redundant. But it's not a bug.

Thanks!

2023-05-16 17:13:56

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v5 1/6] mm/gup: remove unused vmas parameter from get_user_pages()

On 5/16/23 07:35, David Hildenbrand wrote:
...
>>> When passing NULL as "pages" to get_user_pages(), __get_user_pages_locked()
>>> won't set FOLL_GET. As FOLL_PIN is also not set, we won't be messing with
>>> the mapcount of the page.
>
> For completeness: s/mapcount/refcount/ :)

whew, you had me going there! Now it all adds up. :)

thanks,
--
John Hubbard
NVIDIA


2023-05-16 18:47:06

by Lorenzo Stoakes

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] mm/gup: remove vmas parameter from get_user_pages_remote()

On Tue, May 16, 2023 at 11:49:19AM +0200, Anders Roxell wrote:
> On 2023-05-14 22:26, Lorenzo Stoakes wrote:
> > The only instances of get_user_pages_remote() invocations which used the
> > vmas parameter were for a single page which can instead simply look up the
> > VMA directly. In particular:-
> >
> > - __update_ref_ctr() looked up the VMA but did nothing with it so we simply
> > remove it.
> >
> > - __access_remote_vm() was already using vma_lookup() when the original
> > lookup failed so by doing the lookup directly this also de-duplicates the
> > code.
> >
> > We are able to perform these VMA operations as we already hold the
> > mmap_lock in order to be able to call get_user_pages_remote().
> >
> > As part of this work we add get_user_page_vma_remote() which abstracts the
> > VMA lookup, error handling and decrementing the page reference count should
> > the VMA lookup fail.
> >
> > This forms part of a broader set of patches intended to eliminate the vmas
> > parameter altogether.
> >
> > Reviewed-by: Catalin Marinas <[email protected]> (for arm64)
> > Acked-by: David Hildenbrand <[email protected]>
> > Reviewed-by: Janosch Frank <[email protected]> (for s390)
> > Signed-off-by: Lorenzo Stoakes <[email protected]>
> > ---
> > arch/arm64/kernel/mte.c | 17 +++++++++--------
> > arch/s390/kvm/interrupt.c | 2 +-
> > fs/exec.c | 2 +-
> > include/linux/mm.h | 34 +++++++++++++++++++++++++++++++---
> > kernel/events/uprobes.c | 13 +++++--------
> > mm/gup.c | 12 ++++--------
> > mm/memory.c | 14 +++++++-------
> > mm/rmap.c | 2 +-
> > security/tomoyo/domain.c | 2 +-
> > virt/kvm/async_pf.c | 3 +--
> > 10 files changed, 61 insertions(+), 40 deletions(-)
> >
>
> [...]
>
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 146bb94764f8..63632a5eafc1 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5590,7 +5590,6 @@ EXPORT_SYMBOL_GPL(generic_access_phys);
> > int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> > int len, unsigned int gup_flags)
> > {
> > - struct vm_area_struct *vma;
> > void *old_buf = buf;
> > int write = gup_flags & FOLL_WRITE;
> >
> > @@ -5599,13 +5598,15 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> >
> > /* ignore errors, just check how much was successfully transferred */
> > while (len) {
> > - int bytes, ret, offset;
> > + int bytes, offset;
> > void *maddr;
> > - struct page *page = NULL;
> > + struct vm_area_struct *vma;
> > + struct page *page = get_user_page_vma_remote(mm, addr,
> > + gup_flags, &vma);
> > +
> > + if (IS_ERR_OR_NULL(page)) {
> > + int ret = 0;
>
> I see the warning below when building without CONFIG_HAVE_IOREMAP_PROT set.
>
> make --silent --keep-going --jobs=32 \
> O=/home/anders/.cache/tuxmake/builds/1244/build ARCH=arm \
> CROSS_COMPILE=arm-linux-gnueabihf- /home/anders/src/kernel/next/mm/memory.c: In function '__access_remote_vm':
> /home/anders/src/kernel/next/mm/memory.c:5608:29: warning: unused variable 'ret' [-Wunused-variable]
> 5608 | int ret = 0;
> | ^~~
>

Ah damn, nice spot thanks!

>
> >
> > - ret = get_user_pages_remote(mm, addr, 1,
> > - gup_flags, &page, &vma, NULL);
> > - if (ret <= 0) {
> > #ifndef CONFIG_HAVE_IOREMAP_PROT
> > break;
> > #else
> > @@ -5613,7 +5614,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> > * Check if this is a VM_IO | VM_PFNMAP VMA, which
> > * we can access using slightly different code.
> > */
> > - vma = vma_lookup(mm, addr);
> > if (!vma)
> > break;
> > if (vma->vm_ops && vma->vm_ops->access)
>
> Cheers,
> Anders

I enclose a -fix patch for this below:-

----8<----
From 6a4bb033a1ec60920e4945e7e063443f91489d06 Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <[email protected]>
Date: Tue, 16 May 2023 19:16:22 +0100
Subject: [PATCH] mm/gup: remove vmas parameter from get_user_pages_remote()

Fix unused variable warning as reported by Anders Roxell.

Signed-off-by: Lorenzo Stoakes <[email protected]>

---
mm/memory.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 63632a5eafc1..b1b25e61294a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5605,11 +5605,11 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
gup_flags, &vma);

if (IS_ERR_OR_NULL(page)) {
- int ret = 0;
-
#ifndef CONFIG_HAVE_IOREMAP_PROT
break;
#else
+ int ret = 0;
+
/*
* Check if this is a VM_IO | VM_PFNMAP VMA, which
* we can access using slightly different code.
--
2.40.1

2023-05-16 22:24:42

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH v5 3/6] mm/gup: remove vmas parameter from get_user_pages_remote()

On Tue, 16 May 2023 11:49:19 +0200 Anders Roxell <[email protected]> wrote:

> On 2023-05-14 22:26, Lorenzo Stoakes wrote:
> > The only instances of get_user_pages_remote() invocations which used the
> > vmas parameter were for a single page which can instead simply look up the
> > VMA directly. In particular:-
> >
> > - __update_ref_ctr() looked up the VMA but did nothing with it so we simply
> > remove it.
> >
> > - __access_remote_vm() was already using vma_lookup() when the original
> > lookup failed so by doing the lookup directly this also de-duplicates the
> > code.
> >
> > We are able to perform these VMA operations as we already hold the
> > mmap_lock in order to be able to call get_user_pages_remote().
> >
> > As part of this work we add get_user_page_vma_remote() which abstracts the
> > VMA lookup, error handling and decrementing the page reference count should
> > the VMA lookup fail.
> >
> > This forms part of a broader set of patches intended to eliminate the vmas
> > parameter altogether.
> >
> > - int bytes, ret, offset;
> > + int bytes, offset;
> > void *maddr;
> > - struct page *page = NULL;
> > + struct vm_area_struct *vma;
> > + struct page *page = get_user_page_vma_remote(mm, addr,
> > + gup_flags, &vma);
> > +
> > + if (IS_ERR_OR_NULL(page)) {
> > + int ret = 0;
>
> I see the warning below when building without CONFIG_HAVE_IOREMAP_PROT set.
>
> make --silent --keep-going --jobs=32 \
> O=/home/anders/.cache/tuxmake/builds/1244/build ARCH=arm \
> CROSS_COMPILE=arm-linux-gnueabihf- /home/anders/src/kernel/next/mm/memory.c: In function '__access_remote_vm':
> /home/anders/src/kernel/next/mm/memory.c:5608:29: warning: unused variable 'ret' [-Wunused-variable]
> 5608 | int ret = 0;
> | ^~~

Thanks, I did the obvious.

Also s/ret/res/, as `ret' is kinda reserved for "this is what this
function will return".

--- a/mm/memory.c~mm-gup-remove-vmas-parameter-from-get_user_pages_remote-fix
+++ a/mm/memory.c
@@ -5605,11 +5605,11 @@ int __access_remote_vm(struct mm_struct
gup_flags, &vma);

if (IS_ERR_OR_NULL(page)) {
- int ret = 0;
-
#ifndef CONFIG_HAVE_IOREMAP_PROT
break;
#else
+ int res = 0;
+
/*
* Check if this is a VM_IO | VM_PFNMAP VMA, which
* we can access using slightly different code.
@@ -5617,11 +5617,11 @@ int __access_remote_vm(struct mm_struct
if (!vma)
break;
if (vma->vm_ops && vma->vm_ops->access)
- ret = vma->vm_ops->access(vma, addr, buf,
+ res = vma->vm_ops->access(vma, addr, buf,
len, write);
- if (ret <= 0)
+ if (res <= 0)
break;
- bytes = ret;
+ bytes = res;
#endif
} else {
bytes = len;
_


2023-05-17 13:22:00

by Sakari Ailus

[permalink] [raw]
Subject: Re: [PATCH v5 5/6] mm/gup: remove vmas parameter from pin_user_pages()

On Sun, May 14, 2023 at 10:26:58PM +0100, Lorenzo Stoakes wrote:
> We are now in a position where no caller of pin_user_pages() requires the
> vmas parameter at all, so eliminate this parameter from the function and
> all callers.
>
> This clears the way to removing the vmas parameter from GUP altogether.
>
> Acked-by: David Hildenbrand <[email protected]>
> Acked-by: Dennis Dalessandro <[email protected]> (for qib)
> Signed-off-by: Lorenzo Stoakes <[email protected]>

Acked-by: Sakari Ailus <[email protected]> # drivers/media

--
Sakari Ailus