(pin_/get)_user_pages[_remote]() each provide an optional output parameter
for an array of VMA objects associated with each page in the input range.
These provide the means for VMAs to be returned, as long as mm->mmap_lock
is never released during the GUP operation (i.e. the internal flag
FOLL_UNLOCKABLE is not specified).
In addition, these VMAs have also to only be accessed under the mmap_lock,
and become invalidated the moment it is released.
The vast majority of invocations do not use this functionality and of those
that do, all but one retrieve a single VMA to perform checks upon.
It is not egregious in the single VMA cases to simply replace the operation
with a vma_lookup(). In these cases we duplicate the (fast) lookup on a
slow path already under the mmap_lock.
The special case is io_uring, where io_pin_pages() specifically needs to
assert that all the VMAs possess the same vm->vm_file (possibly NULL) and
they are either anonymous or hugetlb pages.
To continue to provide this functionality, we introduce the FOLL_SAME_PAGE
flag which asserts that the vma->vm_file remains the same throughout,
erroring out if this is not the case.
We can then replace the io_uring case by passing FOLL_SAME_FILE and looking
up the first VMA manually and performing the required checks on this
alone. The combination of the two amount to the same checks being
performed (and avoids an allocation).
Eliminating this parameter eliminates an entire class of errors - the vmas
array used to become a set of dangling pointers if access after release of
mmap_lock was attempted, this is simply no longer possible.
In addition the API is simplified and now clearly expresses what it is for
- applying the specified GUP flags and (if pinning) returning pinned pages.
This change additionally opens the door to further potential improvements
in GUP and the possible marrying of disparate code paths.
I have run the gup_test and a simple io_uring program which exercises the
use of FOLL_SAME_PAGE with no issues.
Thanks to Matthew Wilcox for suggesting this refactoring!
v2:
- Only lookup the VMA if the pin succeeded (other than __access_remote_vm()
which has different semantics)
- Be pedantically careful about ensuring that under no circumstances can we
fail to unpin a page
v1:
https://lore.kernel.org/linux-mm/[email protected]/
Lorenzo Stoakes (7):
mm/gup: remove unused vmas parameter from get_user_pages()
mm/gup: remove unused vmas parameter from pin_user_pages_remote()
mm/gup: remove vmas parameter from get_user_pages_remote()
mm/gup: introduce the FOLL_SAME_FILE GUP flag
io_uring: rsrc: use FOLL_SAME_FILE on pin_user_pages()
mm/gup: remove vmas parameter from pin_user_pages()
mm/gup: remove vmas array from internal GUP functions
arch/arm64/kernel/mte.c | 7 +-
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
arch/s390/kvm/interrupt.c | 2 +-
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +-
drivers/infiniband/sw/siw/siw_mem.c | 2 +-
drivers/iommu/iommufd/pages.c | 4 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
fs/exec.c | 2 +-
include/linux/hugetlb.h | 10 +-
include/linux/mm.h | 10 +-
include/linux/mm_types.h | 2 +
io_uring/rsrc.c | 39 +++----
kernel/events/uprobes.c | 12 +-
mm/gup.c | 121 ++++++++-------------
mm/gup_test.c | 14 +--
mm/hugetlb.c | 24 ++--
mm/memory.c | 9 +-
mm/process_vm_access.c | 2 +-
mm/rmap.c | 2 +-
net/xdp/xdp_umem.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 3 +-
virt/kvm/kvm_main.c | 2 +-
30 files changed, 127 insertions(+), 164 deletions(-)
--
2.40.0
No invocation of get_user_pages() uses the vmas parameter, so remove
it.
The GUP API is confusing and caveated. Recent changes have done much to
improve that, however there is more we can do. Exporting vmas is a prime
target as the caller has to be extremely careful to preclude their use
after the mmap_lock has expired or otherwise be left with dangling
pointers.
Removing the vmas parameter focuses the GUP functions upon their primary
purpose - pinning (and outputting) pages as well as performing the actions
implied by the input flags.
This is part of a patch series aiming to remove the vmas parameter
altogether.
Signed-off-by: Lorenzo Stoakes <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
---
arch/x86/kernel/cpu/sgx/ioctl.c | 2 +-
drivers/gpu/drm/radeon/radeon_ttm.c | 2 +-
drivers/misc/sgi-gru/grufault.c | 2 +-
include/linux/mm.h | 3 +--
mm/gup.c | 9 +++------
mm/gup_test.c | 5 ++---
virt/kvm/kvm_main.c | 2 +-
7 files changed, 10 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index 21ca0a831b70..5d390df21440 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -214,7 +214,7 @@ static int __sgx_encl_add_page(struct sgx_encl *encl,
if (!(vma->vm_flags & VM_MAYEXEC))
return -EACCES;
- ret = get_user_pages(src, 1, 0, &src_page, NULL);
+ ret = get_user_pages(src, 1, 0, &src_page);
if (ret < 1)
return -EFAULT;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 1e8e287e113c..0597540f0dde 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -362,7 +362,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_device *bdev, struct ttm_tt *ttm
struct page **pages = ttm->pages + pinned;
r = get_user_pages(userptr, num_pages, write ? FOLL_WRITE : 0,
- pages, NULL);
+ pages);
if (r < 0)
goto release_pages;
diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index b836936e9747..378cf02a2aa1 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -185,7 +185,7 @@ static int non_atomic_pte_lookup(struct vm_area_struct *vma,
#else
*pageshift = PAGE_SHIFT;
#endif
- if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page, NULL) <= 0)
+ if (get_user_pages(vaddr, 1, write ? FOLL_WRITE : 0, &page) <= 0)
return -EFAULT;
*paddr = page_to_phys(page);
put_page(page);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 37554b08bb28..b14cc4972d0b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2380,8 +2380,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas);
diff --git a/mm/gup.c b/mm/gup.c
index 1f72a717232b..7e454d6b157e 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2251,8 +2251,6 @@ long get_user_pages_remote(struct mm_struct *mm,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* This is the same as get_user_pages_remote(), just with a less-flexible
* calling convention where we assume that the mm being operated on belongs to
@@ -2260,16 +2258,15 @@ long get_user_pages_remote(struct mm_struct *mm,
* obviously don't pass FOLL_REMOTE in here.
*/
long get_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;
- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;
return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- vmas, &locked, gup_flags);
+ NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);
diff --git a/mm/gup_test.c b/mm/gup_test.c
index 8ae7307a1bb6..9ba8ea23f84e 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -139,8 +139,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case GUP_BASIC_TEST:
- nr = get_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = get_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_FAST_BENCHMARK:
nr = pin_user_pages_fast(addr, nr, gup->gup_flags,
@@ -161,7 +160,7 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i, NULL);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
break;
default:
ret = -EINVAL;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d255964ec331..7f31e0a4adb5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2474,7 +2474,7 @@ static inline int check_user_page_hwpoison(unsigned long addr)
{
int rc, flags = FOLL_HWPOISON | FOLL_WRITE;
- rc = get_user_pages(addr, 1, flags, NULL, NULL);
+ rc = get_user_pages(addr, 1, flags, NULL);
return rc == -EHWPOISON;
}
--
2.40.0
No invocation of pin_user_pages_remote() uses the vmas parameter, so remove
it. This forms part of a larger patch set eliminating the use of the vmas
parameters altogether.
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
drivers/iommu/iommufd/pages.c | 4 ++--
drivers/vfio/vfio_iommu_type1.c | 2 +-
include/linux/mm.h | 2 +-
mm/gup.c | 8 +++-----
mm/process_vm_access.c | 2 +-
5 files changed, 8 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/iommufd/pages.c b/drivers/iommu/iommufd/pages.c
index f8d92c9bb65b..9d55a2188a64 100644
--- a/drivers/iommu/iommufd/pages.c
+++ b/drivers/iommu/iommufd/pages.c
@@ -786,7 +786,7 @@ static int pfn_reader_user_pin(struct pfn_reader_user *user,
user->locked = 1;
}
rc = pin_user_pages_remote(pages->source_mm, uptr, npages,
- user->gup_flags, user->upages, NULL,
+ user->gup_flags, user->upages,
&user->locked);
}
if (rc <= 0) {
@@ -1787,7 +1787,7 @@ static int iopt_pages_rw_page(struct iopt_pages *pages, unsigned long index,
rc = pin_user_pages_remote(
pages->source_mm, (uintptr_t)(pages->uptr + index * PAGE_SIZE),
1, (flags & IOMMUFD_ACCESS_RW_WRITE) ? FOLL_WRITE : 0, &page,
- NULL, NULL);
+ NULL);
mmap_read_unlock(pages->source_mm);
if (rc != 1) {
if (WARN_ON(rc >= 0))
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 493c31de0edb..e6dc8fec3ed5 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -562,7 +562,7 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
mmap_read_lock(mm);
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
- pages, NULL, NULL);
+ pages, NULL);
if (ret > 0) {
int i;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b14cc4972d0b..ec9875c59f6d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2378,7 +2378,7 @@ long get_user_pages_remote(struct mm_struct *mm,
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ int *locked);
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
diff --git a/mm/gup.c b/mm/gup.c
index 7e454d6b157e..931c805bc32b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3093,8 +3093,6 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
* @gup_flags: flags modifying lookup behaviour
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: pointer to lock flag indicating whether lock is held and
* subsequently whether VM_FAULT_RETRY functionality can be
* utilised. Lock must initially be held.
@@ -3109,14 +3107,14 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast);
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
int local_locked = 1;
- if (!is_valid_gup_args(pages, vmas, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
return 0;
- return __gup_longterm_locked(mm, start, nr_pages, pages, vmas,
+ return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
locked ? locked : &local_locked,
gup_flags);
}
diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 78dfaf9e8990..0523edab03a6 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -104,7 +104,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
mmap_read_lock(mm);
pinned_pages = pin_user_pages_remote(mm, pa, pinned_pages,
flags, process_pages,
- NULL, &locked);
+ &locked);
if (locked)
mmap_read_unlock(mm);
if (pinned_pages <= 0)
--
2.40.0
The only instances of get_user_pages_remote() invocations which used the
vmas parameter were for a single page which can instead simply look up the
VMA directly. In particular:-
- __update_ref_ctr() looked up the VMA but did nothing with it so we simply
remove it.
- __access_remote_vm() was already using vma_lookup() when the original
lookup failed so by doing the lookup directly this also de-duplicates the
code.
This forms part of a broader set of patches intended to eliminate the vmas
parameter altogether.
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/arm64/kernel/mte.c | 7 ++++---
arch/s390/kvm/interrupt.c | 2 +-
fs/exec.c | 2 +-
include/linux/mm.h | 2 +-
kernel/events/uprobes.c | 12 +++++++-----
mm/gup.c | 12 ++++--------
mm/memory.c | 9 +++++----
mm/rmap.c | 2 +-
security/tomoyo/domain.c | 2 +-
virt/kvm/async_pf.c | 3 +--
10 files changed, 26 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..d43a744d7919 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -419,7 +419,6 @@ long get_mte_ctrl(struct task_struct *task)
static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
struct iovec *kiov, unsigned int gup_flags)
{
- struct vm_area_struct *vma;
void __user *buf = kiov->iov_base;
size_t len = kiov->iov_len;
int ret;
@@ -432,12 +431,13 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
return -EIO;
while (len) {
+ struct vm_area_struct *vma;
unsigned long tags, offset;
void *maddr;
struct page *page = NULL;
ret = get_user_pages_remote(mm, addr, 1, gup_flags, &page,
- &vma, NULL);
+ NULL);
if (ret <= 0)
break;
@@ -448,7 +448,8 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
* would cause the existing tags to be cleared if the page
* was never mapped with PROT_MTE.
*/
- if (!(vma->vm_flags & VM_MTE)) {
+ vma = vma_lookup(mm, addr);
+ if (!vma || !(vma->vm_flags & VM_MTE)) {
ret = -EOPNOTSUPP;
put_page(page);
break;
diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 9250fde1f97d..c19d0cb7d2f2 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -2777,7 +2777,7 @@ static struct page *get_map_page(struct kvm *kvm, u64 uaddr)
mmap_read_lock(kvm->mm);
get_user_pages_remote(kvm->mm, uaddr, 1, FOLL_WRITE,
- &page, NULL, NULL);
+ &page, NULL);
mmap_read_unlock(kvm->mm);
return page;
}
diff --git a/fs/exec.c b/fs/exec.c
index 87cf3a2f0e9a..d8d48ee15aac 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -219,7 +219,7 @@ static struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1, gup_flags,
- &page, NULL, NULL);
+ &page, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return NULL;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index ec9875c59f6d..1bfe73a2b6d3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2374,7 +2374,7 @@ extern int __access_remote_vm(struct mm_struct *mm, unsigned long addr,
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked);
+ int *locked);
long pin_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 59887c69d54c..b21993cd2dcc 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -365,7 +365,6 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
{
void *kaddr;
struct page *page;
- struct vm_area_struct *vma;
int ret;
short *ptr;
@@ -373,7 +372,7 @@ __update_ref_ctr(struct mm_struct *mm, unsigned long vaddr, short d)
return -EINVAL;
ret = get_user_pages_remote(mm, vaddr, 1,
- FOLL_WRITE, &page, &vma, NULL);
+ FOLL_WRITE, &page, NULL);
if (unlikely(ret <= 0)) {
/*
* We are asking for 1 page. If get_user_pages_remote() fails,
@@ -475,10 +474,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
gup_flags |= FOLL_SPLIT_PMD;
/* Read the page with vaddr into memory */
ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
- &old_page, &vma, NULL);
+ &old_page, NULL);
if (ret <= 0)
return ret;
+ vma = vma_lookup(mm, vaddr);
+ if (!vma)
+ goto put_old;
+
ret = verify_opcode(old_page, vaddr, &opcode);
if (ret <= 0)
goto put_old;
@@ -2027,8 +2030,7 @@ static int is_trap_at_addr(struct mm_struct *mm, unsigned long vaddr)
* but we treat this as a 'remote' access since it is
* essentially a kernel access to the memory.
*/
- result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page,
- NULL, NULL);
+ result = get_user_pages_remote(mm, vaddr, 1, FOLL_FORCE, &page, NULL);
if (result < 0)
return result;
diff --git a/mm/gup.c b/mm/gup.c
index 931c805bc32b..9440aa54c741 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2165,8 +2165,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: pointer to lock flag indicating whether lock is held and
* subsequently whether VM_FAULT_RETRY functionality can be
* utilised. Lock must initially be held.
@@ -2181,8 +2179,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
*
* The caller is responsible for releasing returned @pages, via put_page().
*
- * @vmas are valid only as long as mmap_lock is held.
- *
* Must be called with mmap_lock held for read or write.
*
* get_user_pages_remote walks a process's page tables and takes a reference
@@ -2219,15 +2215,15 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
int local_locked = 1;
- if (!is_valid_gup_args(pages, vmas, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
FOLL_TOUCH | FOLL_REMOTE))
return -EINVAL;
- return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
locked ? locked : &local_locked,
gup_flags);
}
@@ -2237,7 +2233,7 @@ EXPORT_SYMBOL(get_user_pages_remote);
long get_user_pages_remote(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
return 0;
}
diff --git a/mm/memory.c b/mm/memory.c
index 8ddb10199e8d..913e693322f2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5591,7 +5591,9 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
struct page *page = NULL;
ret = get_user_pages_remote(mm, addr, 1,
- gup_flags, &page, &vma, NULL);
+ gup_flags, &page, NULL);
+ vma = vma_lookup(mm, addr);
+
if (ret <= 0) {
#ifndef CONFIG_HAVE_IOREMAP_PROT
break;
@@ -5600,7 +5602,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
* Check if this is a VM_IO | VM_PFNMAP VMA, which
* we can access using slightly different code.
*/
- vma = vma_lookup(mm, addr);
if (!vma)
break;
if (vma->vm_ops && vma->vm_ops->access)
@@ -5617,11 +5618,11 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
bytes = PAGE_SIZE-offset;
maddr = kmap(page);
- if (write) {
+ if (write && vma) {
copy_to_user_page(vma, page, addr,
maddr + offset, buf, bytes);
set_page_dirty_lock(page);
- } else {
+ } else if (vma) {
copy_from_user_page(vma, page, addr,
buf, maddr + offset, bytes);
}
diff --git a/mm/rmap.c b/mm/rmap.c
index ba901c416785..756ea8a9bb90 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2324,7 +2324,7 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start,
npages = get_user_pages_remote(mm, start, npages,
FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD,
- pages, NULL, NULL);
+ pages, NULL);
if (npages < 0)
return npages;
diff --git a/security/tomoyo/domain.c b/security/tomoyo/domain.c
index 31af29f669d2..ac20c0bdff9d 100644
--- a/security/tomoyo/domain.c
+++ b/security/tomoyo/domain.c
@@ -916,7 +916,7 @@ bool tomoyo_dump_page(struct linux_binprm *bprm, unsigned long pos,
*/
mmap_read_lock(bprm->mm);
ret = get_user_pages_remote(bprm->mm, pos, 1,
- FOLL_FORCE, &page, NULL, NULL);
+ FOLL_FORCE, &page, NULL);
mmap_read_unlock(bprm->mm);
if (ret <= 0)
return false;
diff --git a/virt/kvm/async_pf.c b/virt/kvm/async_pf.c
index 9bfe1d6f6529..e033c79d528e 100644
--- a/virt/kvm/async_pf.c
+++ b/virt/kvm/async_pf.c
@@ -61,8 +61,7 @@ static void async_pf_execute(struct work_struct *work)
* access remotely.
*/
mmap_read_lock(mm);
- get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, NULL,
- &locked);
+ get_user_pages_remote(mm, addr, 1, FOLL_WRITE, NULL, &locked);
if (locked)
mmap_read_unlock(mm);
--
2.40.0
This flag causes GUP to assert that all VMAs within the input range possess
the same vma->vm_file. If not, the operation fails.
This is part of a patch series which eliminates the vmas parameter from the
GUP API, implementing the one remaining assertion within the entire kernel
that requires access to the VMAs associated with a GUP range.
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
include/linux/mm_types.h | 2 ++
mm/gup.c | 16 ++++++++++++----
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 3fc9e680f174..84d1aec9dbab 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1185,6 +1185,8 @@ enum {
FOLL_PCI_P2PDMA = 1 << 10,
/* allow interrupts from generic signals */
FOLL_INTERRUPTIBLE = 1 << 11,
+ /* assert that the range spans VMAs with the same vma->vm_file */
+ FOLL_SAME_FILE = 1 << 12,
/* See also internal only FOLL flags in mm/internal.h */
};
diff --git a/mm/gup.c b/mm/gup.c
index 9440aa54c741..3954ce499a4a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -959,7 +959,8 @@ static int faultin_page(struct vm_area_struct *vma,
return 0;
}
-static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
+static int check_vma_flags(struct vm_area_struct *vma, struct file *file,
+ unsigned long gup_flags)
{
vm_flags_t vm_flags = vma->vm_flags;
int write = (gup_flags & FOLL_WRITE);
@@ -968,7 +969,7 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
if (vm_flags & (VM_IO | VM_PFNMAP))
return -EFAULT;
- if (gup_flags & FOLL_ANON && !vma_is_anonymous(vma))
+ if ((gup_flags & FOLL_ANON) && !vma_is_anonymous(vma))
return -EFAULT;
if ((gup_flags & FOLL_LONGTERM) && vma_is_fsdax(vma))
@@ -977,6 +978,9 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags)
if (vma_is_secretmem(vma))
return -EFAULT;
+ if ((gup_flags & FOLL_SAME_FILE) && vma->vm_file != file)
+ return -EFAULT;
+
if (write) {
if (!(vm_flags & VM_WRITE)) {
if (!(gup_flags & FOLL_FORCE))
@@ -1081,6 +1085,7 @@ static long __get_user_pages(struct mm_struct *mm,
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
struct follow_page_context ctx = { NULL };
+ struct file *file = NULL;
if (!nr_pages)
return 0;
@@ -1111,10 +1116,13 @@ static long __get_user_pages(struct mm_struct *mm,
ret = -EFAULT;
goto out;
}
- ret = check_vma_flags(vma, gup_flags);
+ ret = check_vma_flags(vma, i == 0 ? vma->vm_file : file,
+ gup_flags);
if (ret)
goto out;
+ file = vma->vm_file;
+
if (is_vm_hugetlb_page(vma)) {
i = follow_hugetlb_page(mm, vma, pages, vmas,
&start, &nr_pages, i,
@@ -1595,7 +1603,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
* We want to report -EINVAL instead of -EFAULT for any permission
* problems or incompatible mappings.
*/
- if (check_vma_flags(vma, gup_flags))
+ if (check_vma_flags(vma, vma->vm_file, gup_flags))
return -EINVAL;
ret = __get_user_pages(mm, start, nr_pages, gup_flags,
--
2.40.0
Commit edd478269640 ("io_uring/rsrc: disallow multi-source reg buffers")
prevents io_pin_pages() from pinning pages spanning multiple VMAs with
permitted characteristics (anon/huge), requiring that all VMAs share the
same vm_file.
The newly introduced FOLL_SAME_FILE flag permits this to be expressed as a
GUP flag rather than having to retrieve VMAs to perform the check.
We then only need to perform a VMA lookup for the first VMA to assert the
anon/hugepage requirement as we know the rest of the VMAs will possess the
same characteristics.
Doing this eliminates the one instance of vmas being used by
pin_user_pages().
Signed-off-by: Lorenzo Stoakes <[email protected]>
Suggested-by: Matthew Wilcox (Oracle) <[email protected]>
---
io_uring/rsrc.c | 39 ++++++++++++++++-----------------------
1 file changed, 16 insertions(+), 23 deletions(-)
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index 7a43aed8e395..adc860bcbd4f 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1141,9 +1141,8 @@ static int io_buffer_account_pin(struct io_ring_ctx *ctx, struct page **pages,
struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
{
unsigned long start, end, nr_pages;
- struct vm_area_struct **vmas = NULL;
struct page **pages = NULL;
- int i, pret, ret = -ENOMEM;
+ int pret, ret = -ENOMEM;
end = (ubuf + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
start = ubuf >> PAGE_SHIFT;
@@ -1153,31 +1152,26 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
if (!pages)
goto done;
- vmas = kvmalloc_array(nr_pages, sizeof(struct vm_area_struct *),
- GFP_KERNEL);
- if (!vmas)
- goto done;
-
ret = 0;
mmap_read_lock(current->mm);
- pret = pin_user_pages(ubuf, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
- pages, vmas);
+
+ pret = pin_user_pages(ubuf, nr_pages,
+ FOLL_WRITE | FOLL_LONGTERM | FOLL_SAME_FILE,
+ pages, NULL);
if (pret == nr_pages) {
- struct file *file = vmas[0]->vm_file;
+ /*
+ * lookup the first VMA, we require that all VMAs in range
+ * maintain the same file characteristics, as enforced by
+ * FOLL_SAME_FILE
+ */
+ struct vm_area_struct *vma = vma_lookup(current->mm, ubuf);
+ struct file *file;
/* don't support file backed memory */
- for (i = 0; i < nr_pages; i++) {
- if (vmas[i]->vm_file != file) {
- ret = -EINVAL;
- break;
- }
- if (!file)
- continue;
- if (!vma_is_shmem(vmas[i]) && !is_file_hugepages(file)) {
- ret = -EOPNOTSUPP;
- break;
- }
- }
+ file = vma->vm_file;
+ if (file && !vma_is_shmem(vma) && !is_file_hugepages(file))
+ ret = -EOPNOTSUPP;
+
*npages = nr_pages;
} else {
ret = pret < 0 ? pret : -EFAULT;
@@ -1194,7 +1188,6 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
}
ret = 0;
done:
- kvfree(vmas);
if (ret < 0) {
kvfree(pages);
pages = ERR_PTR(ret);
--
2.40.0
After the introduction of FOLL_SAME_FILE we no longer require vmas for any
invocation of pin_user_pages(), so eliminate this parameter from the
function and all callers.
This clears the way to removing the vmas parameter from GUP altogether.
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
arch/powerpc/mm/book3s64/iommu_api.c | 2 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 2 +-
drivers/infiniband/sw/siw/siw_mem.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 2 +-
drivers/vdpa/vdpa_user/vduse_dev.c | 2 +-
drivers/vhost/vdpa.c | 2 +-
include/linux/mm.h | 3 +--
io_uring/rsrc.c | 2 +-
mm/gup.c | 9 +++------
mm/gup_test.c | 9 ++++-----
net/xdp/xdp_umem.c | 2 +-
12 files changed, 17 insertions(+), 22 deletions(-)
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c b/arch/powerpc/mm/book3s64/iommu_api.c
index 81d7185e2ae8..d19fb1f3007d 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -105,7 +105,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, unsigned long ua,
ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n,
FOLL_WRITE | FOLL_LONGTERM,
- mem->hpages + entry, NULL);
+ mem->hpages + entry);
if (ret == n) {
pinned += n;
continue;
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index f693bc753b6b..1bb7507325bc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -111,7 +111,7 @@ int qib_get_user_pages(unsigned long start_page, size_t num_pages,
ret = pin_user_pages(start_page + got * PAGE_SIZE,
num_pages - got,
FOLL_LONGTERM | FOLL_WRITE,
- p + got, NULL);
+ p + got);
if (ret < 0) {
mmap_read_unlock(current->mm);
goto bail_release;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 2a5cac2658ec..84e0f41e7dfa 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -140,7 +140,7 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
ret = pin_user_pages(cur_base,
min_t(unsigned long, npages,
PAGE_SIZE / sizeof(struct page *)),
- gup_flags, page_list, NULL);
+ gup_flags, page_list);
if (ret < 0)
goto out;
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index f51ab2ccf151..e6e25f15567d 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -422,7 +422,7 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
umem->page_chunk[i].plist = plist;
while (nents) {
rv = pin_user_pages(first_page_va, nents, foll_flags,
- plist, NULL);
+ plist);
if (rv < 0)
goto out_sem_up;
diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 53001532e8e3..405b89ea1054 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -180,7 +180,7 @@ static int videobuf_dma_init_user_locked(struct videobuf_dmabuf *dma,
data, size, dma->nr_pages);
err = pin_user_pages(data & PAGE_MASK, dma->nr_pages, gup_flags,
- dma->pages, NULL);
+ dma->pages);
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c
index 0c3b48616a9f..1f80254604f0 100644
--- a/drivers/vdpa/vdpa_user/vduse_dev.c
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -995,7 +995,7 @@ static int vduse_dev_reg_umem(struct vduse_dev *dev,
goto out;
pinned = pin_user_pages(uaddr, npages, FOLL_LONGTERM | FOLL_WRITE,
- page_list, NULL);
+ page_list);
if (pinned != npages) {
ret = pinned < 0 ? pinned : -ENOMEM;
goto out;
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 7be9d9d8f01c..4317128c1c62 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -952,7 +952,7 @@ static int vhost_vdpa_pa_map(struct vhost_vdpa *v,
while (npages) {
sz2pin = min_t(unsigned long, npages, list_size);
pinned = pin_user_pages(cur_base, sz2pin,
- gup_flags, page_list, NULL);
+ gup_flags, page_list);
if (sz2pin != pinned) {
if (pinned < 0) {
ret = pinned;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1bfe73a2b6d3..363e3d0d46f4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2382,8 +2382,7 @@ long pin_user_pages_remote(struct mm_struct *mm,
long get_user_pages(unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages);
long pin_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas);
+ unsigned int gup_flags, struct page **pages);
long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
struct page **pages, unsigned int gup_flags);
long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index adc860bcbd4f..92d0d47e322c 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1157,7 +1157,7 @@ struct page **io_pin_pages(unsigned long ubuf, unsigned long len, int *npages)
pret = pin_user_pages(ubuf, nr_pages,
FOLL_WRITE | FOLL_LONGTERM | FOLL_SAME_FILE,
- pages, NULL);
+ pages);
if (pret == nr_pages) {
/*
* lookup the first VMA, we require that all VMAs in range
diff --git a/mm/gup.c b/mm/gup.c
index 3954ce499a4a..714970ef3b30 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3132,8 +3132,6 @@ EXPORT_SYMBOL(pin_user_pages_remote);
* @gup_flags: flags modifying lookup behaviour
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
*
* Nearly the same as get_user_pages(), except that FOLL_TOUCH is not set, and
* FOLL_PIN is set.
@@ -3142,15 +3140,14 @@ EXPORT_SYMBOL(pin_user_pages_remote);
* see Documentation/core-api/pin_user_pages.rst for details.
*/
long pin_user_pages(unsigned long start, unsigned long nr_pages,
- unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas)
+ unsigned int gup_flags, struct page **pages)
{
int locked = 1;
- if (!is_valid_gup_args(pages, vmas, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
return 0;
return __gup_longterm_locked(current->mm, start, nr_pages,
- pages, vmas, &locked, gup_flags);
+ pages, NULL, &locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);
diff --git a/mm/gup_test.c b/mm/gup_test.c
index 9ba8ea23f84e..1668ce0e0783 100644
--- a/mm/gup_test.c
+++ b/mm/gup_test.c
@@ -146,18 +146,17 @@ static int __gup_test_ioctl(unsigned int cmd,
pages + i);
break;
case PIN_BASIC_TEST:
- nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i,
- NULL);
+ nr = pin_user_pages(addr, nr, gup->gup_flags, pages + i);
break;
case PIN_LONGTERM_BENCHMARK:
nr = pin_user_pages(addr, nr,
gup->gup_flags | FOLL_LONGTERM,
- pages + i, NULL);
+ pages + i);
break;
case DUMP_USER_PAGES_TEST:
if (gup->test_flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
nr = pin_user_pages(addr, nr, gup->gup_flags,
- pages + i, NULL);
+ pages + i);
else
nr = get_user_pages(addr, nr, gup->gup_flags,
pages + i);
@@ -270,7 +269,7 @@ static inline int pin_longterm_test_start(unsigned long arg)
gup_flags, pages);
else
cur_pages = pin_user_pages(addr, remaining_pages,
- gup_flags, pages, NULL);
+ gup_flags, pages);
if (cur_pages < 0) {
pin_longterm_test_stop();
ret = cur_pages;
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 02207e852d79..06cead2b8e34 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -103,7 +103,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem, unsigned long address)
mmap_read_lock(current->mm);
npgs = pin_user_pages(address, umem->npgs,
- gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
+ gup_flags | FOLL_LONGTERM, &umem->pgs[0]);
mmap_read_unlock(current->mm);
if (npgs != umem->npgs) {
--
2.40.0
Now we have eliminated all callers to GUP APIs which use the vmas
parameter, eliminate it altogether.
This eliminates a class of bugs where vmas might have been kept around
longer than the mmap_lock and thus we need not be concerned about locks
being dropped during this operation leaving behind dangling pointers.
This simplifies the GUP API and makes it considerably clearer as to its
purpose - follow flags are applied and if pinning, an array of pages is
returned.
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
include/linux/hugetlb.h | 10 ++---
mm/gup.c | 83 +++++++++++++++--------------------------
mm/hugetlb.c | 24 +++++-------
3 files changed, 45 insertions(+), 72 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 28703fe22386..2735e7a2b998 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -141,9 +141,8 @@ int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *,
struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
- struct page **, struct vm_area_struct **,
- unsigned long *, unsigned long *, long, unsigned int,
- int *);
+ struct page **, unsigned long *, unsigned long *,
+ long, unsigned int, int *);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *,
zap_flags_t);
@@ -297,9 +296,8 @@ static inline struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
static inline long follow_hugetlb_page(struct mm_struct *mm,
struct vm_area_struct *vma, struct page **pages,
- struct vm_area_struct **vmas, unsigned long *position,
- unsigned long *nr_pages, long i, unsigned int flags,
- int *nonblocking)
+ unsigned long *position, unsigned long *nr_pages,
+ long i, unsigned int flags, int *nonblocking)
{
BUG();
return 0;
diff --git a/mm/gup.c b/mm/gup.c
index 714970ef3b30..385e428a4acb 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1028,8 +1028,6 @@ static int check_vma_flags(struct vm_area_struct *vma, struct file *file,
* @pages: array that receives pointers to the pages pinned.
* Should be at least nr_pages long. Or NULL, if caller
* only intends to ensure the pages are faulted in.
- * @vmas: array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
* @locked: whether we're still with the mmap_lock held
*
* Returns either number of pages pinned (which may be less than the
@@ -1043,8 +1041,6 @@ static int check_vma_flags(struct vm_area_struct *vma, struct file *file,
*
* The caller is responsible for releasing returned @pages, via put_page().
*
- * @vmas are valid only as long as mmap_lock is held.
- *
* Must be called with mmap_lock held. It may be released. See below.
*
* __get_user_pages walks a process's page tables and takes a reference to
@@ -1080,7 +1076,7 @@ static int check_vma_flags(struct vm_area_struct *vma, struct file *file,
static long __get_user_pages(struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
- struct vm_area_struct **vmas, int *locked)
+ int *locked)
{
long ret = 0, i = 0;
struct vm_area_struct *vma = NULL;
@@ -1124,9 +1120,9 @@ static long __get_user_pages(struct mm_struct *mm,
file = vma->vm_file;
if (is_vm_hugetlb_page(vma)) {
- i = follow_hugetlb_page(mm, vma, pages, vmas,
- &start, &nr_pages, i,
- gup_flags, locked);
+ i = follow_hugetlb_page(mm, vma, pages,
+ &start, &nr_pages, i,
+ gup_flags, locked);
if (!*locked) {
/*
* We've got a VM_FAULT_RETRY
@@ -1191,10 +1187,6 @@ static long __get_user_pages(struct mm_struct *mm,
ctx.page_mask = 0;
}
next_page:
- if (vmas) {
- vmas[i] = vma;
- ctx.page_mask = 0;
- }
page_increm = 1 + (~(start >> PAGE_SHIFT) & ctx.page_mask);
if (page_increm > nr_pages)
page_increm = nr_pages;
@@ -1349,7 +1341,6 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
- struct vm_area_struct **vmas,
int *locked,
unsigned int flags)
{
@@ -1387,7 +1378,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
pages_done = 0;
for (;;) {
ret = __get_user_pages(mm, start, nr_pages, flags, pages,
- vmas, locked);
+ locked);
if (!(flags & FOLL_UNLOCKABLE)) {
/* VM_FAULT_RETRY couldn't trigger, bypass */
pages_done = ret;
@@ -1451,7 +1442,7 @@ static __always_inline long __get_user_pages_locked(struct mm_struct *mm,
*locked = 1;
ret = __get_user_pages(mm, start, 1, flags | FOLL_TRIED,
- pages, NULL, locked);
+ pages, locked);
if (!*locked) {
/* Continue to retry until we succeeded */
BUG_ON(ret != 0);
@@ -1549,7 +1540,7 @@ long populate_vma_page_range(struct vm_area_struct *vma,
* not result in a stack expansion that recurses back here.
*/
ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked ? locked : &local_locked);
+ NULL, locked ? locked : &local_locked);
lru_add_drain();
return ret;
}
@@ -1607,7 +1598,7 @@ long faultin_vma_page_range(struct vm_area_struct *vma, unsigned long start,
return -EINVAL;
ret = __get_user_pages(mm, start, nr_pages, gup_flags,
- NULL, NULL, locked);
+ NULL, locked);
lru_add_drain();
return ret;
}
@@ -1675,8 +1666,7 @@ int __mm_populate(unsigned long start, unsigned long len, int ignore_errors)
#else /* CONFIG_MMU */
static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
unsigned long nr_pages, struct page **pages,
- struct vm_area_struct **vmas, int *locked,
- unsigned int foll_flags)
+ int *locked, unsigned int foll_flags)
{
struct vm_area_struct *vma;
bool must_unlock = false;
@@ -1720,8 +1710,7 @@ static long __get_user_pages_locked(struct mm_struct *mm, unsigned long start,
if (pages[i])
get_page(pages[i]);
}
- if (vmas)
- vmas[i] = vma;
+
start = (start + PAGE_SIZE) & PAGE_MASK;
}
@@ -1902,8 +1891,7 @@ struct page *get_dump_page(unsigned long addr)
int locked = 0;
int ret;
- ret = __get_user_pages_locked(current->mm, addr, 1, &page, NULL,
- &locked,
+ ret = __get_user_pages_locked(current->mm, addr, 1, &page, &locked,
FOLL_FORCE | FOLL_DUMP | FOLL_GET);
return (ret == 1) ? page : NULL;
}
@@ -2076,7 +2064,6 @@ static long __gup_longterm_locked(struct mm_struct *mm,
unsigned long start,
unsigned long nr_pages,
struct page **pages,
- struct vm_area_struct **vmas,
int *locked,
unsigned int gup_flags)
{
@@ -2084,13 +2071,13 @@ static long __gup_longterm_locked(struct mm_struct *mm,
long rc, nr_pinned_pages;
if (!(gup_flags & FOLL_LONGTERM))
- return __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
+ return __get_user_pages_locked(mm, start, nr_pages, pages,
locked, gup_flags);
flags = memalloc_pin_save();
do {
nr_pinned_pages = __get_user_pages_locked(mm, start, nr_pages,
- pages, vmas, locked,
+ pages, locked,
gup_flags);
if (nr_pinned_pages <= 0) {
rc = nr_pinned_pages;
@@ -2108,9 +2095,8 @@ static long __gup_longterm_locked(struct mm_struct *mm,
* Check that the given flags are valid for the exported gup/pup interface, and
* update them with the required flags that the caller must have set.
*/
-static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
- int *locked, unsigned int *gup_flags_p,
- unsigned int to_set)
+static bool is_valid_gup_args(struct page **pages, int *locked,
+ unsigned int *gup_flags_p, unsigned int to_set)
{
unsigned int gup_flags = *gup_flags_p;
@@ -2152,13 +2138,6 @@ static bool is_valid_gup_args(struct page **pages, struct vm_area_struct **vmas,
(gup_flags & FOLL_PCI_P2PDMA)))
return false;
- /*
- * Can't use VMAs with locked, as locked allows GUP to unlock
- * which invalidates the vmas array
- */
- if (WARN_ON_ONCE(vmas && (gup_flags & FOLL_UNLOCKABLE)))
- return false;
-
*gup_flags_p = gup_flags;
return true;
}
@@ -2227,11 +2206,11 @@ long get_user_pages_remote(struct mm_struct *mm,
{
int local_locked = 1;
- if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, locked, &gup_flags,
FOLL_TOUCH | FOLL_REMOTE))
return -EINVAL;
- return __get_user_pages_locked(mm, start, nr_pages, pages, NULL,
+ return __get_user_pages_locked(mm, start, nr_pages, pages,
locked ? locked : &local_locked,
gup_flags);
}
@@ -2266,11 +2245,11 @@ long get_user_pages(unsigned long start, unsigned long nr_pages,
{
int locked = 1;
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_TOUCH))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_TOUCH))
return -EINVAL;
return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- NULL, &locked, gup_flags);
+ &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages);
@@ -2294,12 +2273,12 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
{
int locked = 0;
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_TOUCH | FOLL_UNLOCKABLE))
return -EINVAL;
return __get_user_pages_locked(current->mm, start, nr_pages, pages,
- NULL, &locked, gup_flags);
+ &locked, gup_flags);
}
EXPORT_SYMBOL(get_user_pages_unlocked);
@@ -2982,7 +2961,7 @@ static int internal_get_user_pages_fast(unsigned long start,
start += nr_pinned << PAGE_SHIFT;
pages += nr_pinned;
ret = __gup_longterm_locked(current->mm, start, nr_pages - nr_pinned,
- pages, NULL, &locked,
+ pages, &locked,
gup_flags | FOLL_TOUCH | FOLL_UNLOCKABLE);
if (ret < 0) {
/*
@@ -3024,7 +3003,7 @@ int get_user_pages_fast_only(unsigned long start, int nr_pages,
* FOLL_FAST_ONLY is required in order to match the API description of
* this routine: no fall back to regular ("slow") GUP.
*/
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_GET | FOLL_FAST_ONLY))
return -EINVAL;
@@ -3057,7 +3036,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
* FOLL_GET, because gup fast is always a "pin with a +1 page refcount"
* request.
*/
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_GET))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_GET))
return -EINVAL;
return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
}
@@ -3082,7 +3061,7 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast);
int pin_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages)
{
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
return -EINVAL;
return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages);
}
@@ -3115,10 +3094,10 @@ long pin_user_pages_remote(struct mm_struct *mm,
{
int local_locked = 1;
- if (!is_valid_gup_args(pages, NULL, locked, &gup_flags,
+ if (!is_valid_gup_args(pages, locked, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_REMOTE))
return 0;
- return __gup_longterm_locked(mm, start, nr_pages, pages, NULL,
+ return __gup_longterm_locked(mm, start, nr_pages, pages,
locked ? locked : &local_locked,
gup_flags);
}
@@ -3144,10 +3123,10 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages,
{
int locked = 1;
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags, FOLL_PIN))
+ if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
return 0;
return __gup_longterm_locked(current->mm, start, nr_pages,
- pages, NULL, &locked, gup_flags);
+ pages, &locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages);
@@ -3161,11 +3140,11 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
{
int locked = 0;
- if (!is_valid_gup_args(pages, NULL, NULL, &gup_flags,
+ if (!is_valid_gup_args(pages, NULL, &gup_flags,
FOLL_PIN | FOLL_TOUCH | FOLL_UNLOCKABLE))
return 0;
- return __gup_longterm_locked(current->mm, start, nr_pages, pages, NULL,
+ return __gup_longterm_locked(current->mm, start, nr_pages, pages,
&locked, gup_flags);
}
EXPORT_SYMBOL(pin_user_pages_unlocked);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a08fb47fb200..85138a0394b9 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6371,17 +6371,14 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte,
}
#endif /* CONFIG_USERFAULTFD */
-static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
- int refs, struct page **pages,
- struct vm_area_struct **vmas)
+static void record_subpages(struct page *page, struct vm_area_struct *vma,
+ int refs, struct page **pages)
{
int nr;
for (nr = 0; nr < refs; nr++) {
if (likely(pages))
pages[nr] = nth_page(page, nr);
- if (vmas)
- vmas[nr] = vma;
}
}
@@ -6454,9 +6451,9 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
}
long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
- struct page **pages, struct vm_area_struct **vmas,
- unsigned long *position, unsigned long *nr_pages,
- long i, unsigned int flags, int *locked)
+ struct page **pages, unsigned long *position,
+ unsigned long *nr_pages, long i, unsigned int flags,
+ int *locked)
{
unsigned long pfn_offset;
unsigned long vaddr = *position;
@@ -6584,7 +6581,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
* If subpage information not requested, update counters
* and skip the same_page loop below.
*/
- if (!pages && !vmas && !pfn_offset &&
+ if (!pages && !pfn_offset &&
(vaddr + huge_page_size(h) < vma->vm_end) &&
(remainder >= pages_per_huge_page(h))) {
vaddr += huge_page_size(h);
@@ -6599,11 +6596,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
refs = min3(pages_per_huge_page(h) - pfn_offset, remainder,
(vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT);
- if (pages || vmas)
- record_subpages_vmas(nth_page(page, pfn_offset),
- vma, refs,
- likely(pages) ? pages + i : NULL,
- vmas ? vmas + i : NULL);
+ if (pages)
+ record_subpages(nth_page(page, pfn_offset),
+ vma, refs,
+ likely(pages) ? pages + i : NULL);
if (pages) {
/*
--
2.40.0
On 2023/04/15 18:08, Lorenzo Stoakes wrote:
> @@ -475,10 +474,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> gup_flags |= FOLL_SPLIT_PMD;
> /* Read the page with vaddr into memory */
> ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
> - &old_page, &vma, NULL);
> + &old_page, NULL);
> if (ret <= 0)
> return ret;
>
> + vma = vma_lookup(mm, vaddr);
> + if (!vma)
> + goto put_old;
> +
> ret = verify_opcode(old_page, vaddr, &opcode);
> if (ret <= 0)
> goto put_old;
This conversion looks wrong.
This causes returning a positive number when vma_lookup() returned NULL.
* Return 0 (success) or a negative errno.
On Sat, Apr 15, 2023 at 06:52:41PM +0900, Tetsuo Handa wrote:
> On 2023/04/15 18:08, Lorenzo Stoakes wrote:
> > @@ -475,10 +474,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> > gup_flags |= FOLL_SPLIT_PMD;
> > /* Read the page with vaddr into memory */
> > ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
> > - &old_page, &vma, NULL);
> > + &old_page, NULL);
> > if (ret <= 0)
> > return ret;
> >
> > + vma = vma_lookup(mm, vaddr);
> > + if (!vma)
> > + goto put_old;
> > +
> > ret = verify_opcode(old_page, vaddr, &opcode);
> > if (ret <= 0)
> > goto put_old;
>
> This conversion looks wrong.
> This causes returning a positive number when vma_lookup() returned NULL.
>
> * Return 0 (success) or a negative errno.
>
In reality it shouldn't be possible for vma to return NULL, I'm adding the
checks to be extra careful.
In any case you're right, attaching a -fix patch to avoid spam:-
----8<----
From 0710d01ea69ad4e846fa1e56a40b253ff59714ac Mon Sep 17 00:00:00 2001
From: Lorenzo Stoakes <[email protected]>
Date: Sat, 15 Apr 2023 11:09:15 +0100
Subject: [PATCH] mm/gup: remove vmas parameter from get_user_pages_remote()
Correct vma NULL check as indicated by Tetsuo.
Signed-off-by: Lorenzo Stoakes <[email protected]>
---
kernel/events/uprobes.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b21993cd2dcc..affe42a13eff 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -479,8 +479,10 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
return ret;
vma = vma_lookup(mm, vaddr);
- if (!vma)
+ if (!vma) {
+ ret = -EINVAL;
goto put_old;
+ }
ret = verify_opcode(old_page, vaddr, &opcode);
if (ret <= 0)
--
2.40.0
On 2023/04/15 19:14, Lorenzo Stoakes wrote:
> On Sat, Apr 15, 2023 at 06:52:41PM +0900, Tetsuo Handa wrote:
>> On 2023/04/15 18:08, Lorenzo Stoakes wrote:
>>> @@ -475,10 +474,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
>>> gup_flags |= FOLL_SPLIT_PMD;
>>> /* Read the page with vaddr into memory */
>>> ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
>>> - &old_page, &vma, NULL);
>>> + &old_page, NULL);
>>> if (ret <= 0)
>>> return ret;
>>>
>>> + vma = vma_lookup(mm, vaddr);
>>> + if (!vma)
>>> + goto put_old;
>>> +
>>> ret = verify_opcode(old_page, vaddr, &opcode);
>>> if (ret <= 0)
>>> goto put_old;
>>
>> This conversion looks wrong.
>> This causes returning a positive number when vma_lookup() returned NULL.
>>
>> * Return 0 (success) or a negative errno.
>>
>
> In reality it shouldn't be possible for vma to return NULL, I'm adding the
> checks to be extra careful.
>
> In any case you're right, attaching a -fix patch to avoid spam:-
If you want to return -EINVAL when vma_lookup() returned NULL for whatever
unexpected reason, returning -EOPNOTSUPP in below path looks strange.
> @@ -448,7 +448,8 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
> * would cause the existing tags to be cleared if the page
> * was never mapped with PROT_MTE.
> */
> - if (!(vma->vm_flags & VM_MTE)) {
> + vma = vma_lookup(mm, addr);
> + if (!vma || !(vma->vm_flags & VM_MTE)) {
> ret = -EOPNOTSUPP;
> put_page(page);
> break;
Also,
> @@ -5591,7 +5591,9 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> struct page *page = NULL;
>
> ret = get_user_pages_remote(mm, addr, 1,
> - gup_flags, &page, &vma, NULL);
> + gup_flags, &page, NULL);
> + vma = vma_lookup(mm, addr);
> +
> if (ret <= 0) {
> #ifndef CONFIG_HAVE_IOREMAP_PROT
> break;
> @@ -5600,7 +5602,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> * Check if this is a VM_IO | VM_PFNMAP VMA, which
> * we can access using slightly different code.
> */
> - vma = vma_lookup(mm, addr);
> if (!vma)
> break;
> if (vma->vm_ops && vma->vm_ops->access)
> @@ -5617,11 +5618,11 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> bytes = PAGE_SIZE-offset;
>
> maddr = kmap(page);
> - if (write) {
> + if (write && vma) {
> copy_to_user_page(vma, page, addr,
> maddr + offset, buf, bytes);
> set_page_dirty_lock(page);
> - } else {
> + } else if (vma) {
> copy_from_user_page(vma, page, addr,
> buf, maddr + offset, bytes);
> }
not calling copy_{from,to}_user_page() if vma == NULL is not sufficient for
propagating an error to caller.
On Sat, Apr 15, 2023 at 07:36:06PM +0900, Tetsuo Handa wrote:
> On 2023/04/15 19:14, Lorenzo Stoakes wrote:
> > On Sat, Apr 15, 2023 at 06:52:41PM +0900, Tetsuo Handa wrote:
> >> On 2023/04/15 18:08, Lorenzo Stoakes wrote:
> >>> @@ -475,10 +474,14 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm,
> >>> gup_flags |= FOLL_SPLIT_PMD;
> >>> /* Read the page with vaddr into memory */
> >>> ret = get_user_pages_remote(mm, vaddr, 1, gup_flags,
> >>> - &old_page, &vma, NULL);
> >>> + &old_page, NULL);
> >>> if (ret <= 0)
> >>> return ret;
> >>>
> >>> + vma = vma_lookup(mm, vaddr);
> >>> + if (!vma)
> >>> + goto put_old;
> >>> +
> >>> ret = verify_opcode(old_page, vaddr, &opcode);
> >>> if (ret <= 0)
> >>> goto put_old;
> >>
> >> This conversion looks wrong.
> >> This causes returning a positive number when vma_lookup() returned NULL.
> >>
> >> * Return 0 (success) or a negative errno.
> >>
> >
> > In reality it shouldn't be possible for vma to return NULL, I'm adding the
> > checks to be extra careful.
> >
> > In any case you're right, attaching a -fix patch to avoid spam:-
>
> If you want to return -EINVAL when vma_lookup() returned NULL for whatever
> unexpected reason, returning -EOPNOTSUPP in below path looks strange.
>
This feels a little pedantic, this is not a condition that is expected to
occur in practice, I'm not sure users will be writing code to differentiate
between the two, and certainly vma being NULL implies MTE is not supported.
To differentiate with minimal churn, I'll add a WARN_ON_ONCE() here and in
each other case where an impossible condition arises as it would be
indicative of a kernel bug.
> > @@ -448,7 +448,8 @@ static int __access_remote_tags(struct mm_struct *mm, unsigned long addr,
> > * would cause the existing tags to be cleared if the page
> > * was never mapped with PROT_MTE.
> > */
> > - if (!(vma->vm_flags & VM_MTE)) {
> > + vma = vma_lookup(mm, addr);
> > + if (!vma || !(vma->vm_flags & VM_MTE)) {
> > ret = -EOPNOTSUPP;
> > put_page(page);
> > break;
>
> Also,
>
> > @@ -5591,7 +5591,9 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> > struct page *page = NULL;
> >
> > ret = get_user_pages_remote(mm, addr, 1,
> > - gup_flags, &page, &vma, NULL);
> > + gup_flags, &page, NULL);
> > + vma = vma_lookup(mm, addr);
> > +
> > if (ret <= 0) {
> > #ifndef CONFIG_HAVE_IOREMAP_PROT
> > break;
> > @@ -5600,7 +5602,6 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> > * Check if this is a VM_IO | VM_PFNMAP VMA, which
> > * we can access using slightly different code.
> > */
> > - vma = vma_lookup(mm, addr);
> > if (!vma)
> > break;
> > if (vma->vm_ops && vma->vm_ops->access)
> > @@ -5617,11 +5618,11 @@ int __access_remote_vm(struct mm_struct *mm, unsigned long addr, void *buf,
> > bytes = PAGE_SIZE-offset;
> >
> > maddr = kmap(page);
> > - if (write) {
> > + if (write && vma) {
> > copy_to_user_page(vma, page, addr,
> > maddr + offset, buf, bytes);
> > set_page_dirty_lock(page);
> > - } else {
> > + } else if (vma) {
> > copy_from_user_page(vma, page, addr,
> > buf, maddr + offset, bytes);
> > }
>
> not calling copy_{from,to}_user_page() if vma == NULL is not sufficient for
> propagating an error to caller.
>
This is a product of wanting to avoid churn, again this condition is simply
impossible. Also as a pedantic side note - the loop explicitly indicates no
errors are propagated, so there is no need to do so.
However, I want to be consistent with how I handle this and also I think
it's sensible to add warnings for violation of this 'impossible' condition
so I'll add a branch for it.
Since I'd end up confusingly fixing up a fix-patch (and I want to change
another patch in series to be consistent), I'll do a respin, apologies for
spam in advance...