2020-11-27 16:46:55

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 00/17] follow_pfn and other iomap races

Hi all

Another update of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: https://lore.kernel.org/dri-devel/[email protected]/
v2: https://lore.kernel.org/dri-devel/[email protected]
v3: https://lore.kernel.org/dri-devel/[email protected]/
v4: https://lore.kernel.org/dri-devel/[email protected]/
v5: https://lore.kernel.org/dri-devel/[email protected]/
v6: https://lore.kernel.org/dri-devel/[email protected]/

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/[email protected]/

I think the first 12 patches are ready for landing. The parts starting
with "mm: Add unsafe_follow_pfn" probably need more baking time.

Andrew, can you please pick these up, or do you prefer I do a topic branch
and send them to Linus directly in the next merge window?

Changes in v7:
- more acks/reviews
- reordered with the ready pieces at the front
- simplified the new follow_pfn function as Jason suggested

Changes in v6:
- Tested v4l userptr as Tomasz suggested. No boom observed
- Added RFC for locking down follow_pfn, per discussion with Christoph and
Jason.
- Explain why pup_fast is safe in relevant patches, there was a bit a
confusion when discussing v5.
- Fix up the resource patch, with CONFIG_IO_STRICT_DEVMEM it crashed on
boot due to an unintended change (reported by John)

Changes in v5:
- Tomasz found some issues in the media patches
- Polish suggested by Christoph for the unsafe_follow_pfn patch

Changes in v4:
- Drop the s390 patch, that was very stand-alone and now queued up to land
through s390 trees.
- Comment polish per Dan's review.

Changes in v3:
- Bunch of polish all over, no functional changes aside from one barrier
in the resource code, for consistency.
- A few more r-b tags.

Changes in v2:
- tons of small polish&fixes all over, thanks to all the reviewers who
spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
stuff with a few gdb sessions using our i915 debug tools (hence now also
the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
kernel/resource.c, address_space mapping is now set up at open time for
everyone (which required some sysfs changes). Does indeed look a lot
cleaner and a lot less invasive than I feared at first.

Coments and review on the remaining bits very much welcome, especially
from the kvm and vfio side.

Cheers, Daniel

Daniel Vetter (17):
drm/exynos: Stop using frame_vector helpers
drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
misc/habana: Stop using frame_vector helpers
misc/habana: Use FOLL_LONGTERM for userptr
mm/frame-vector: Use FOLL_LONGTERM
media: videobuf2: Move frame_vector into media subsystem
mm: Close race in generic_access_phys
PCI: Obey iomem restrictions for procfs mmap
/dev/mem: Only set filp->f_mapping
resource: Move devmem revoke code to resource framework
sysfs: Support zapping of binary attr mmaps
PCI: Revoke mappings like devmem
mm: Add unsafe_follow_pfn
media/videobuf1|2: Mark follow_pfn usage as unsafe
vfio/type1: Mark follow_pfn as unsafe
kvm: pass kvm argument to follow_pfn callsites
mm: add mmu_notifier argument to follow_pfn

arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
arch/powerpc/kvm/e500_mmu_host.c | 2 +-
arch/x86/kvm/mmu/mmu.c | 8 +-
drivers/char/mem.c | 86 +-------------
drivers/gpu/drm/exynos/Kconfig | 1 -
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++----
drivers/media/common/videobuf2/Kconfig | 1 -
drivers/media/common/videobuf2/Makefile | 1 +
.../media/common/videobuf2}/frame_vector.c | 57 ++++-----
.../media/common/videobuf2/videobuf2-memops.c | 3 +-
drivers/media/platform/omap/Kconfig | 1 -
drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
drivers/misc/habanalabs/Kconfig | 1 -
drivers/misc/habanalabs/common/habanalabs.h | 6 +-
drivers/misc/habanalabs/common/memory.c | 52 +++-----
drivers/pci/pci-sysfs.c | 4 +
drivers/pci/proc.c | 6 +
drivers/vfio/vfio_iommu_type1.c | 4 +-
fs/sysfs/file.c | 11 ++
include/linux/ioport.h | 6 +-
include/linux/kvm_host.h | 9 +-
include/linux/mm.h | 50 +-------
include/linux/sysfs.h | 2 +
include/media/frame_vector.h | 47 ++++++++
include/media/videobuf2-core.h | 1 +
kernel/resource.c | 98 ++++++++++++++-
mm/Kconfig | 3 -
mm/Makefile | 1 -
mm/memory.c | 112 +++++++++++++++---
mm/nommu.c | 16 ++-
security/Kconfig | 13 ++
virt/kvm/kvm_main.c | 56 +++++----
33 files changed, 413 insertions(+), 299 deletions(-)
rename {mm => drivers/media/common/videobuf2}/frame_vector.c (84%)
create mode 100644 include/media/frame_vector.h

--
2.29.2


2020-11-27 16:47:13

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 08/17] PCI: Obey iomem restrictions for procfs mmap

There's three ways to access PCI BARs from userspace: /dev/mem, sysfs
files, and the old proc interface. Two check against
iomem_is_exclusive, proc never did. And with CONFIG_IO_STRICT_DEVMEM,
this starts to matter, since we don't want random userspace having
access to PCI BARs while a driver is loaded and using it.

Fix this by adding the same iomem_is_exclusive() check we already have
on the sysfs side in pci_mmap_resource().

Acked-by: Bjorn Helgaas <[email protected]>
References: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Bjorn Helgaas <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
--
v2: Improve commit message (Bjorn)
---
drivers/pci/proc.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/pci/proc.c b/drivers/pci/proc.c
index d35186b01d98..3a2f90beb4cb 100644
--- a/drivers/pci/proc.c
+++ b/drivers/pci/proc.c
@@ -274,6 +274,11 @@ static int proc_bus_pci_mmap(struct file *file, struct vm_area_struct *vma)
else
return -EINVAL;
}
+
+ if (dev->resource[i].flags & IORESOURCE_MEM &&
+ iomem_is_exclusive(dev->resource[i].start))
+ return -EINVAL;
+
ret = pci_mmap_page_range(dev, i, vma,
fpriv->mmap_state, write_combine);
if (ret < 0)
--
2.29.2

2020-11-27 16:47:39

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 07/17] mm: Close race in generic_access_phys

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since ioremap might need to manipulate pagetables too we need to drop
the pt lock and have a retry loop if we raced.

While at it, also add kerneldoc and improve the comment for the
vma_ops->access function. It's for accessing, not for moving the
memory from iomem to system memory, as the old comment seemed to
suggest.

References: 28b2ee20c7cb ("access_process_vm device memory infrastructure")
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Benjamin Herrensmidt <[email protected]>
Cc: Dave Airlie <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Chris Wilson <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
--
v2: Fix inversion in the retry check (John).

v4: While at it, use offset_in_page (Chris Wilson)
---
include/linux/mm.h | 3 ++-
mm/memory.c | 46 +++++++++++++++++++++++++++++++++++++++++++---
2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index c4cc8ea1402c..b50cbb33d0f3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -574,7 +574,8 @@ struct vm_operations_struct {
vm_fault_t (*pfn_mkwrite)(struct vm_fault *vmf);

/* called by access_process_vm when get_user_pages() fails, typically
- * for use by special VMAs that can switch between memory and hardware
+ * for use by special VMAs. See also generic_access_phys() for a generic
+ * implementation useful for any iomem mapping.
*/
int (*access)(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write);
diff --git a/mm/memory.c b/mm/memory.c
index c48f8df6e502..ac32039ce941 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4847,28 +4847,68 @@ int follow_phys(struct vm_area_struct *vma,
return ret;
}

+/**
+ * generic_access_phys - generic implementation for iomem mmap access
+ * @vma: the vma to access
+ * @addr: userspace addres, not relative offset within @vma
+ * @buf: buffer to read/write
+ * @len: length of transfer
+ * @write: set to FOLL_WRITE when writing, otherwise reading
+ *
+ * This is a generic implementation for &vm_operations_struct.access for an
+ * iomem mapping. This callback is used by access_process_vm() when the @vma is
+ * not page based.
+ */
int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
void *buf, int len, int write)
{
resource_size_t phys_addr;
unsigned long prot = 0;
void __iomem *maddr;
- int offset = addr & (PAGE_SIZE-1);
+ pte_t *ptep, pte;
+ spinlock_t *ptl;
+ int offset = offset_in_page(addr);
+ int ret = -EINVAL;
+
+ if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+ return -EINVAL;
+
+retry:
+ if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+ return -EINVAL;
+ pte = *ptep;
+ pte_unmap_unlock(ptep, ptl);

- if (follow_phys(vma, addr, write, &prot, &phys_addr))
+ prot = pgprot_val(pte_pgprot(pte));
+ phys_addr = (resource_size_t)pte_pfn(pte) << PAGE_SHIFT;
+
+ if ((write & FOLL_WRITE) && !pte_write(pte))
return -EINVAL;

maddr = ioremap_prot(phys_addr, PAGE_ALIGN(len + offset), prot);
if (!maddr)
return -ENOMEM;

+ if (follow_pte(vma->vm_mm, addr, &ptep, &ptl))
+ goto out_unmap;
+
+ if (!pte_same(pte, *ptep)) {
+ pte_unmap_unlock(ptep, ptl);
+ iounmap(maddr);
+
+ goto retry;
+ }
+
if (write)
memcpy_toio(maddr + offset, buf, len);
else
memcpy_fromio(buf, maddr + offset, len);
+ ret = len;
+ pte_unmap_unlock(ptep, ptl);
+out_unmap:
iounmap(maddr);

- return len;
+ return ret;
}
EXPORT_SYMBOL_GPL(generic_access_phys);
#endif
--
2.29.2

2020-11-27 16:48:38

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 02/17] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Kukjin Kim <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index ecede41af9b9..1e0c5a7f206e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
goto err_free;
}

- ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ ret = pin_user_pages_fast(start, npages,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
g2d_userptr->pages);
if (ret != npages) {
DRM_DEV_ERROR(g2d->dev,
--
2.29.2

2020-11-27 16:48:46

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 04/17] misc/habana: Use FOLL_LONGTERM for userptr

These are persistent, not just for the duration of a dma operation.

Reviewed-by: Oded Gabbay <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Oded Gabbay <[email protected]>
Cc: Omer Shpigelman <[email protected]>
Cc: Ofir Bitton <[email protected]>
Cc: Tomer Tayar <[email protected]>
Cc: Moti Haimovski <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Pawel Piskorski <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/misc/habanalabs/common/memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 2c59fa869684..0d25ae1d5f3e 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1296,7 +1296,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
if (!userptr->pages)
return -ENOMEM;

- rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ rc = pin_user_pages_fast(start, npages,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
userptr->pages);

if (rc != npages) {
--
2.29.2

2020-11-27 20:24:20

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 05/17] mm/frame-vector: Use FOLL_LONGTERM

This is used by media/videbuf2 for persistent dma mappings, not just
for a single dma operation and then freed again, so needs
FOLL_LONGTERM.

Unfortunately current pup_locked doesn't support FOLL_LONGTERM due to
locking issues. Rework the code to pull the pup path out from the
mmap_sem critical section as suggested by Jason.

By relying entirely on the vma checks in pin_user_pages and follow_pfn
(for vm_flags and vma_is_fsdax) we can also streamline the code a lot.

Note that pin_user_pages_fast is a safe replacement despite the
seeming lack of checking for vma->vm_flasg & (VM_IO | VM_PFNMAP). Such
ptes are marked with pte_mkspecial (which pup_fast rejects in the
fastpath), and only architectures supporting that support the
pin_user_pages_fast fastpath.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Pawel Osciak <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Tomasz Figa <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
--
v2: Streamline the code and further simplify the loop checks (Jason)

v5: Review from Tomasz:
- fix page counting for the follow_pfn case by resetting ret
- drop gup_flags paramater, now unused

v6: Explain why pup_fast is safe, after discussions with John and
Christoph.
---
.../media/common/videobuf2/videobuf2-memops.c | 3 +-
include/linux/mm.h | 2 +-
mm/frame_vector.c | 53 ++++++-------------
3 files changed, 19 insertions(+), 39 deletions(-)

diff --git a/drivers/media/common/videobuf2/videobuf2-memops.c b/drivers/media/common/videobuf2/videobuf2-memops.c
index 6e9e05153f4e..9dd6c27162f4 100644
--- a/drivers/media/common/videobuf2/videobuf2-memops.c
+++ b/drivers/media/common/videobuf2/videobuf2-memops.c
@@ -40,7 +40,6 @@ struct frame_vector *vb2_create_framevec(unsigned long start,
unsigned long first, last;
unsigned long nr;
struct frame_vector *vec;
- unsigned int flags = FOLL_FORCE | FOLL_WRITE;

first = start >> PAGE_SHIFT;
last = (start + length - 1) >> PAGE_SHIFT;
@@ -48,7 +47,7 @@ struct frame_vector *vb2_create_framevec(unsigned long start,
vec = frame_vector_create(nr);
if (!vec)
return ERR_PTR(-ENOMEM);
- ret = get_vaddr_frames(start & PAGE_MASK, nr, flags, vec);
+ ret = get_vaddr_frames(start & PAGE_MASK, nr, vec);
if (ret < 0)
goto out_destroy;
/* We accept only complete set of PFNs */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47bff16c182d..29a1941cd255 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1765,7 +1765,7 @@ struct frame_vector {
struct frame_vector *frame_vector_create(unsigned int nr_frames);
void frame_vector_destroy(struct frame_vector *vec);
int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
- unsigned int gup_flags, struct frame_vector *vec);
+ struct frame_vector *vec);
void put_vaddr_frames(struct frame_vector *vec);
int frame_vector_to_pages(struct frame_vector *vec);
void frame_vector_to_pfns(struct frame_vector *vec);
diff --git a/mm/frame_vector.c b/mm/frame_vector.c
index 10f82d5643b6..f8c34b895c76 100644
--- a/mm/frame_vector.c
+++ b/mm/frame_vector.c
@@ -32,13 +32,12 @@
* This function takes care of grabbing mmap_lock as necessary.
*/
int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
- unsigned int gup_flags, struct frame_vector *vec)
+ struct frame_vector *vec)
{
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
int ret = 0;
int err;
- int locked;

if (nr_frames == 0)
return 0;
@@ -48,40 +47,26 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,

start = untagged_addr(start);

- mmap_read_lock(mm);
- locked = 1;
- vma = find_vma_intersection(mm, start, start + 1);
- if (!vma) {
- ret = -EFAULT;
- goto out;
- }
-
- /*
- * While get_vaddr_frames() could be used for transient (kernel
- * controlled lifetime) pinning of memory pages all current
- * users establish long term (userspace controlled lifetime)
- * page pinning. Treat get_vaddr_frames() like
- * get_user_pages_longterm() and disallow it for filesystem-dax
- * mappings.
- */
- if (vma_is_fsdax(vma)) {
- ret = -EOPNOTSUPP;
- goto out;
- }
-
- if (!(vma->vm_flags & (VM_IO | VM_PFNMAP))) {
+ ret = pin_user_pages_fast(start, nr_frames,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
+ (struct page **)(vec->ptrs));
+ if (ret > 0) {
vec->got_ref = true;
vec->is_pfns = false;
- ret = pin_user_pages_locked(start, nr_frames,
- gup_flags, (struct page **)(vec->ptrs), &locked);
- goto out;
+ goto out_unlocked;
}

+ mmap_read_lock(mm);
vec->got_ref = false;
vec->is_pfns = true;
+ ret = 0;
do {
unsigned long *nums = frame_vector_pfns(vec);

+ vma = find_vma_intersection(mm, start, start + 1);
+ if (!vma)
+ break;
+
while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
err = follow_pfn(vma, start, &nums[ret]);
if (err) {
@@ -92,17 +77,13 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
start += PAGE_SIZE;
ret++;
}
- /*
- * We stop if we have enough pages or if VMA doesn't completely
- * cover the tail page.
- */
- if (ret >= nr_frames || start < vma->vm_end)
+ /* Bail out if VMA doesn't completely cover the tail page. */
+ if (start < vma->vm_end)
break;
- vma = find_vma_intersection(mm, start, start + 1);
- } while (vma && vma->vm_flags & (VM_IO | VM_PFNMAP));
+ } while (ret < nr_frames);
out:
- if (locked)
- mmap_read_unlock(mm);
+ mmap_read_unlock(mm);
+out_unlocked:
if (!ret)
ret = -EFAULT;
if (ret > 0)
--
2.29.2

2020-11-28 00:26:06

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v7 14/17] media/videobuf1|2: Mark follow_pfn usage as unsafe

The media model assumes that buffers are all preallocated, so that
when a media pipeline is running we never miss a deadline because the
buffers aren't allocated or available.

This means we cannot fix the v4l follow_pfn usage through
mmu_notifier, without breaking how this all works. The only real fix
is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
tell everyone to cut over to dma-buf memory sharing for zerocopy.

userptr for normal memory will keep working as-is, this only affects
the zerocopy userptr usage enabled in 50ac952d2263 ("[media]
videobuf2-dma-sg: Support io userptr operations on io memory").

Acked-by: Tomasz Figa <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Pawel Osciak <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Tomasz Figa <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Daniel Jordan <[email protected]>
Cc: Michel Lespinasse <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
--
v3:
- Reference the commit that enabled the zerocopy userptr use case to
make it abundandtly clear that this patch only affects that, and not
normal memory userptr. The old commit message already explained that
normal memory userptr is unaffected, but I guess that was not clear
enough.
---
drivers/media/common/videobuf2/frame_vector.c | 2 +-
drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
index a0e65481a201..1a82ec13ea00 100644
--- a/drivers/media/common/videobuf2/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -70,7 +70,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
break;

while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
- err = follow_pfn(vma, start, &nums[ret]);
+ err = unsafe_follow_pfn(vma, start, &nums[ret]);
if (err) {
if (ret == 0)
ret = err;
diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
index 52312ce2ba05..821c4a76ab96 100644
--- a/drivers/media/v4l2-core/videobuf-dma-contig.c
+++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
@@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
user_address = untagged_baddr;

while (pages_done < (mem->size >> PAGE_SHIFT)) {
- ret = follow_pfn(vma, user_address, &this_pfn);
+ ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
if (ret)
break;

--
2.29.2

2020-12-22 16:09:34

by Oded Gabbay

[permalink] [raw]
Subject: Re: [PATCH v7 04/17] misc/habana: Use FOLL_LONGTERM for userptr

On Fri, Nov 27, 2020 at 6:42 PM Daniel Vetter <[email protected]> wrote:
>
> These are persistent, not just for the duration of a dma operation.
>
> Reviewed-by: Oded Gabbay <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Oded Gabbay <[email protected]>
> Cc: Omer Shpigelman <[email protected]>
> Cc: Ofir Bitton <[email protected]>
> Cc: Tomer Tayar <[email protected]>
> Cc: Moti Haimovski <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Pawel Piskorski <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> drivers/misc/habanalabs/common/memory.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 2c59fa869684..0d25ae1d5f3e 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1296,7 +1296,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> if (!userptr->pages)
> return -ENOMEM;
>
> - rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> + rc = pin_user_pages_fast(start, npages,
> + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> userptr->pages);
>
> if (rc != npages) {
> --
> 2.29.2
>

Hi Daniel,
Is this patch-set going to be in 5.11 ?
If not, can I just pick the two patches relevant to my driver and push
them through my tree ?

Thanks,
Oded

2021-01-12 13:26:49

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] follow_pfn and other iomap races

On Fri, Nov 27, 2020 at 05:41:14PM +0100, Daniel Vetter wrote:
> Hi all
>
> Another update of my patch series to clamp down a bunch of races and gaps
> around follow_pfn and other access to iomem mmaps. Previous version:
>
> v1: https://lore.kernel.org/dri-devel/[email protected]/
> v2: https://lore.kernel.org/dri-devel/[email protected]
> v3: https://lore.kernel.org/dri-devel/[email protected]/
> v4: https://lore.kernel.org/dri-devel/[email protected]/
> v5: https://lore.kernel.org/dri-devel/[email protected]/
> v6: https://lore.kernel.org/dri-devel/[email protected]/
>
> And the discussion that sparked this journey:
>
> https://lore.kernel.org/dri-devel/[email protected]/
>
> I think the first 12 patches are ready for landing. The parts starting
> with "mm: Add unsafe_follow_pfn" probably need more baking time.
>
> Andrew, can you please pick these up, or do you prefer I do a topic branch
> and send them to Linus directly in the next merge window?
>
> Changes in v7:
> - more acks/reviews
> - reordered with the ready pieces at the front
> - simplified the new follow_pfn function as Jason suggested
>
> Changes in v6:
> - Tested v4l userptr as Tomasz suggested. No boom observed
> - Added RFC for locking down follow_pfn, per discussion with Christoph and
> Jason.
> - Explain why pup_fast is safe in relevant patches, there was a bit a
> confusion when discussing v5.
> - Fix up the resource patch, with CONFIG_IO_STRICT_DEVMEM it crashed on
> boot due to an unintended change (reported by John)
>
> Changes in v5:
> - Tomasz found some issues in the media patches
> - Polish suggested by Christoph for the unsafe_follow_pfn patch
>
> Changes in v4:
> - Drop the s390 patch, that was very stand-alone and now queued up to land
> through s390 trees.
> - Comment polish per Dan's review.
>
> Changes in v3:
> - Bunch of polish all over, no functional changes aside from one barrier
> in the resource code, for consistency.
> - A few more r-b tags.
>
> Changes in v2:
> - tons of small polish&fixes all over, thanks to all the reviewers who
> spotted issues
> - I managed to test at least the generic_access_phys and pci mmap revoke
> stuff with a few gdb sessions using our i915 debug tools (hence now also
> the drm/i915 patch to properly request all the pci bar regions)
> - reworked approach for the pci mmap revoke: Infrastructure moved into
> kernel/resource.c, address_space mapping is now set up at open time for
> everyone (which required some sysfs changes). Does indeed look a lot
> cleaner and a lot less invasive than I feared at first.
>
> Coments and review on the remaining bits very much welcome, especially
> from the kvm and vfio side.
>
> Cheers, Daniel
>
> Daniel Vetter (17):
> drm/exynos: Stop using frame_vector helpers
> drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
> misc/habana: Stop using frame_vector helpers
> misc/habana: Use FOLL_LONGTERM for userptr
> mm/frame-vector: Use FOLL_LONGTERM
> media: videobuf2: Move frame_vector into media subsystem
> mm: Close race in generic_access_phys
> PCI: Obey iomem restrictions for procfs mmap
> /dev/mem: Only set filp->f_mapping
> resource: Move devmem revoke code to resource framework
> sysfs: Support zapping of binary attr mmaps
> PCI: Revoke mappings like devmem

As Jason suggested, I've pulled the first 1 patches into a topic branch.

Stephen, can you please add the below to linux-next for the 5.12 merge
window?

git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup

Once this part has landed I'll see what to do with the below part.

Thanks, Daniel

> mm: Add unsafe_follow_pfn
> media/videobuf1|2: Mark follow_pfn usage as unsafe
> vfio/type1: Mark follow_pfn as unsafe
> kvm: pass kvm argument to follow_pfn callsites
> mm: add mmu_notifier argument to follow_pfn
>
> arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
> arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
> arch/powerpc/kvm/e500_mmu_host.c | 2 +-
> arch/x86/kvm/mmu/mmu.c | 8 +-
> drivers/char/mem.c | 86 +-------------
> drivers/gpu/drm/exynos/Kconfig | 1 -
> drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++----
> drivers/media/common/videobuf2/Kconfig | 1 -
> drivers/media/common/videobuf2/Makefile | 1 +
> .../media/common/videobuf2}/frame_vector.c | 57 ++++-----
> .../media/common/videobuf2/videobuf2-memops.c | 3 +-
> drivers/media/platform/omap/Kconfig | 1 -
> drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
> drivers/misc/habanalabs/Kconfig | 1 -
> drivers/misc/habanalabs/common/habanalabs.h | 6 +-
> drivers/misc/habanalabs/common/memory.c | 52 +++-----
> drivers/pci/pci-sysfs.c | 4 +
> drivers/pci/proc.c | 6 +
> drivers/vfio/vfio_iommu_type1.c | 4 +-
> fs/sysfs/file.c | 11 ++
> include/linux/ioport.h | 6 +-
> include/linux/kvm_host.h | 9 +-
> include/linux/mm.h | 50 +-------
> include/linux/sysfs.h | 2 +
> include/media/frame_vector.h | 47 ++++++++
> include/media/videobuf2-core.h | 1 +
> kernel/resource.c | 98 ++++++++++++++-
> mm/Kconfig | 3 -
> mm/Makefile | 1 -
> mm/memory.c | 112 +++++++++++++++---
> mm/nommu.c | 16 ++-
> security/Kconfig | 13 ++
> virt/kvm/kvm_main.c | 56 +++++----
> 33 files changed, 413 insertions(+), 299 deletions(-)
> rename {mm => drivers/media/common/videobuf2}/frame_vector.c (84%)
> create mode 100644 include/media/frame_vector.h
>
> --
> 2.29.2
>
> _______________________________________________
> dri-devel mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-01-12 13:30:41

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] follow_pfn and other iomap races

On Tue, Jan 12, 2021 at 2:24 PM Daniel Vetter <[email protected]> wrote:
>
> On Fri, Nov 27, 2020 at 05:41:14PM +0100, Daniel Vetter wrote:
> > Hi all
> >
> > Another update of my patch series to clamp down a bunch of races and gaps
> > around follow_pfn and other access to iomem mmaps. Previous version:
> >
> > v1: https://lore.kernel.org/dri-devel/[email protected]/
> > v2: https://lore.kernel.org/dri-devel/[email protected]
> > v3: https://lore.kernel.org/dri-devel/[email protected]/
> > v4: https://lore.kernel.org/dri-devel/[email protected]/
> > v5: https://lore.kernel.org/dri-devel/[email protected]/
> > v6: https://lore.kernel.org/dri-devel/[email protected]/
> >
> > And the discussion that sparked this journey:
> >
> > https://lore.kernel.org/dri-devel/[email protected]/
> >
> > I think the first 12 patches are ready for landing. The parts starting
> > with "mm: Add unsafe_follow_pfn" probably need more baking time.
> >
> > Andrew, can you please pick these up, or do you prefer I do a topic branch
> > and send them to Linus directly in the next merge window?
> >
> > Changes in v7:
> > - more acks/reviews
> > - reordered with the ready pieces at the front
> > - simplified the new follow_pfn function as Jason suggested
> >
> > Changes in v6:
> > - Tested v4l userptr as Tomasz suggested. No boom observed
> > - Added RFC for locking down follow_pfn, per discussion with Christoph and
> > Jason.
> > - Explain why pup_fast is safe in relevant patches, there was a bit a
> > confusion when discussing v5.
> > - Fix up the resource patch, with CONFIG_IO_STRICT_DEVMEM it crashed on
> > boot due to an unintended change (reported by John)
> >
> > Changes in v5:
> > - Tomasz found some issues in the media patches
> > - Polish suggested by Christoph for the unsafe_follow_pfn patch
> >
> > Changes in v4:
> > - Drop the s390 patch, that was very stand-alone and now queued up to land
> > through s390 trees.
> > - Comment polish per Dan's review.
> >
> > Changes in v3:
> > - Bunch of polish all over, no functional changes aside from one barrier
> > in the resource code, for consistency.
> > - A few more r-b tags.
> >
> > Changes in v2:
> > - tons of small polish&fixes all over, thanks to all the reviewers who
> > spotted issues
> > - I managed to test at least the generic_access_phys and pci mmap revoke
> > stuff with a few gdb sessions using our i915 debug tools (hence now also
> > the drm/i915 patch to properly request all the pci bar regions)
> > - reworked approach for the pci mmap revoke: Infrastructure moved into
> > kernel/resource.c, address_space mapping is now set up at open time for
> > everyone (which required some sysfs changes). Does indeed look a lot
> > cleaner and a lot less invasive than I feared at first.
> >
> > Coments and review on the remaining bits very much welcome, especially
> > from the kvm and vfio side.
> >
> > Cheers, Daniel
> >
> > Daniel Vetter (17):
> > drm/exynos: Stop using frame_vector helpers
> > drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
> > misc/habana: Stop using frame_vector helpers
> > misc/habana: Use FOLL_LONGTERM for userptr
> > mm/frame-vector: Use FOLL_LONGTERM
> > media: videobuf2: Move frame_vector into media subsystem
> > mm: Close race in generic_access_phys
> > PCI: Obey iomem restrictions for procfs mmap
> > /dev/mem: Only set filp->f_mapping
> > resource: Move devmem revoke code to resource framework
> > sysfs: Support zapping of binary attr mmaps
> > PCI: Revoke mappings like devmem
>
> As Jason suggested, I've pulled the first 1 patches into a topic branch.

Uh this was meant to read "first _12_ patches" ofc.
-Daniel

>
> Stephen, can you please add the below to linux-next for the 5.12 merge
> window?
>
> git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup
>
> Once this part has landed I'll see what to do with the below part.
>
> Thanks, Daniel
>
> > mm: Add unsafe_follow_pfn
> > media/videobuf1|2: Mark follow_pfn usage as unsafe
> > vfio/type1: Mark follow_pfn as unsafe
> > kvm: pass kvm argument to follow_pfn callsites
> > mm: add mmu_notifier argument to follow_pfn
> >
> > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +-
> > arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +-
> > arch/powerpc/kvm/e500_mmu_host.c | 2 +-
> > arch/x86/kvm/mmu/mmu.c | 8 +-
> > drivers/char/mem.c | 86 +-------------
> > drivers/gpu/drm/exynos/Kconfig | 1 -
> > drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++----
> > drivers/media/common/videobuf2/Kconfig | 1 -
> > drivers/media/common/videobuf2/Makefile | 1 +
> > .../media/common/videobuf2}/frame_vector.c | 57 ++++-----
> > .../media/common/videobuf2/videobuf2-memops.c | 3 +-
> > drivers/media/platform/omap/Kconfig | 1 -
> > drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
> > drivers/misc/habanalabs/Kconfig | 1 -
> > drivers/misc/habanalabs/common/habanalabs.h | 6 +-
> > drivers/misc/habanalabs/common/memory.c | 52 +++-----
> > drivers/pci/pci-sysfs.c | 4 +
> > drivers/pci/proc.c | 6 +
> > drivers/vfio/vfio_iommu_type1.c | 4 +-
> > fs/sysfs/file.c | 11 ++
> > include/linux/ioport.h | 6 +-
> > include/linux/kvm_host.h | 9 +-
> > include/linux/mm.h | 50 +-------
> > include/linux/sysfs.h | 2 +
> > include/media/frame_vector.h | 47 ++++++++
> > include/media/videobuf2-core.h | 1 +
> > kernel/resource.c | 98 ++++++++++++++-
> > mm/Kconfig | 3 -
> > mm/Makefile | 1 -
> > mm/memory.c | 112 +++++++++++++++---
> > mm/nommu.c | 16 ++-
> > security/Kconfig | 13 ++
> > virt/kvm/kvm_main.c | 56 +++++----
> > 33 files changed, 413 insertions(+), 299 deletions(-)
> > rename {mm => drivers/media/common/videobuf2}/frame_vector.c (84%)
> > create mode 100644 include/media/frame_vector.h
> >
> > --
> > 2.29.2
> >
> > _______________________________________________
> > dri-devel mailing list
> > [email protected]
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2021-01-13 02:34:58

by Stephen Rothwell

[permalink] [raw]
Subject: Re: [PATCH v7 00/17] follow_pfn and other iomap races

Hi Daniel,

On Tue, 12 Jan 2021 14:24:27 +0100 Daniel Vetter <[email protected]> wrote:
>
> As Jason suggested, I've pulled the first 1 patches into a topic branch.
>
> Stephen, can you please add the below to linux-next for the 5.12 merge
> window?
>
> git://anongit.freedesktop.org/drm/drm topic/iomem-mmap-vs-gup

Added from today.

Thanks for adding your subsystem tree as a participant of linux-next. As
you may know, this is not a judgement of your code. The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window.

You will need to ensure that the patches/commits in your tree/series have
been:
* submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
* posted to the relevant mailing list,
* reviewed by you (or another maintainer of your subsystem tree),
* successfully unit tested, and
* destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch). It is allowed to be rebased if you deem it necessary.

--
Cheers,
Stephen Rothwell
[email protected]


Attachments:
(No filename) (499.00 B)
OpenPGP digital signature

2021-01-19 12:47:45

by Hans Verkuil

[permalink] [raw]
Subject: Re: [PATCH v7 14/17] media/videobuf1|2: Mark follow_pfn usage as unsafe

On 27/11/2020 17:41, Daniel Vetter wrote:
> The media model assumes that buffers are all preallocated, so that
> when a media pipeline is running we never miss a deadline because the
> buffers aren't allocated or available.
>
> This means we cannot fix the v4l follow_pfn usage through
> mmu_notifier, without breaking how this all works. The only real fix
> is to deprecate userptr support for VM_IO | VM_PFNMAP mappings and
> tell everyone to cut over to dma-buf memory sharing for zerocopy.
>
> userptr for normal memory will keep working as-is, this only affects
> the zerocopy userptr usage enabled in 50ac952d2263 ("[media]
> videobuf2-dma-sg: Support io userptr operations on io memory").
>
> Acked-by: Tomasz Figa <[email protected]>

Acked-by: Hans Verkuil <[email protected]>

Regards,

Hans

> Signed-off-by: Daniel Vetter <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Pawel Osciak <[email protected]>
> Cc: Marek Szyprowski <[email protected]>
> Cc: Kyungmin Park <[email protected]>
> Cc: Tomasz Figa <[email protected]>
> Cc: Laurent Dufour <[email protected]>
> Cc: Vlastimil Babka <[email protected]>
> Cc: Daniel Jordan <[email protected]>
> Cc: Michel Lespinasse <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> --
> v3:
> - Reference the commit that enabled the zerocopy userptr use case to
> make it abundandtly clear that this patch only affects that, and not
> normal memory userptr. The old commit message already explained that
> normal memory userptr is unaffected, but I guess that was not clear
> enough.
> ---
> drivers/media/common/videobuf2/frame_vector.c | 2 +-
> drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/media/common/videobuf2/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
> index a0e65481a201..1a82ec13ea00 100644
> --- a/drivers/media/common/videobuf2/frame_vector.c
> +++ b/drivers/media/common/videobuf2/frame_vector.c
> @@ -70,7 +70,7 @@ int get_vaddr_frames(unsigned long start, unsigned int nr_frames,
> break;
>
> while (ret < nr_frames && start + PAGE_SIZE <= vma->vm_end) {
> - err = follow_pfn(vma, start, &nums[ret]);
> + err = unsafe_follow_pfn(vma, start, &nums[ret]);
> if (err) {
> if (ret == 0)
> ret = err;
> diff --git a/drivers/media/v4l2-core/videobuf-dma-contig.c b/drivers/media/v4l2-core/videobuf-dma-contig.c
> index 52312ce2ba05..821c4a76ab96 100644
> --- a/drivers/media/v4l2-core/videobuf-dma-contig.c
> +++ b/drivers/media/v4l2-core/videobuf-dma-contig.c
> @@ -183,7 +183,7 @@ static int videobuf_dma_contig_user_get(struct videobuf_dma_contig_memory *mem,
> user_address = untagged_baddr;
>
> while (pages_done < (mem->size >> PAGE_SHIFT)) {
> - ret = follow_pfn(vma, user_address, &this_pfn);
> + ret = unsafe_follow_pfn(vma, user_address, &this_pfn);
> if (ret)
> break;
>
>