2020-10-21 13:11:58

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 00/16] follow_pfn and other iomap races

Hi all,

Round 3 of my patch series to clamp down a bunch of races and gaps
around follow_pfn and other access to iomem mmaps. Previous version:

v1: https://lore.kernel.org/dri-devel/[email protected]/
v2: https://lore.kernel.org/dri-devel/[email protected]

And the discussion that sparked this journey:

https://lore.kernel.org/dri-devel/[email protected]/

I was waiting for the testing result for habanalabs from Oded, but I guess
Oded was waiting for my v3.

Changes in v3:
- Bunch of polish all over, no functional changes aside from one barrier
in the resource code, for consistency.
- A few more r-b tags.

Changes in v2:
- tons of small polish&fixes all over, thanks to all the reviewers who
spotted issues
- I managed to test at least the generic_access_phys and pci mmap revoke
stuff with a few gdb sessions using our i915 debug tools (hence now also
the drm/i915 patch to properly request all the pci bar regions)
- reworked approach for the pci mmap revoke: Infrastructure moved into
kernel/resource.c, address_space mapping is now set up at open time for
everyone (which required some sysfs changes). Does indeed look a lot
cleaner and a lot less invasive than I feared at first.

The big thing I can't test are all the frame_vector changes in habanalbas,
exynos and media. Gerald has given the s390 patch a spin already.

Review, testing, feedback all very much welcome.

Cheers, Daniel
Daniel Vetter (16):
drm/exynos: Stop using frame_vector helpers
drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
misc/habana: Stop using frame_vector helpers
misc/habana: Use FOLL_LONGTERM for userptr
mm/frame-vector: Use FOLL_LONGTERM
media: videobuf2: Move frame_vector into media subsystem
mm: Close race in generic_access_phys
s390/pci: Remove races against pte updates
mm: Add unsafe_follow_pfn
media/videbuf1|2: Mark follow_pfn usage as unsafe
vfio/type1: Mark follow_pfn as unsafe
PCI: Obey iomem restrictions for procfs mmap
/dev/mem: Only set filp->f_mapping
resource: Move devmem revoke code to resource framework
sysfs: Support zapping of binary attr mmaps
PCI: Revoke mappings like devmem

arch/s390/pci/pci_mmio.c | 98 ++++++++++-------
drivers/char/mem.c | 86 +--------------
drivers/gpu/drm/exynos/Kconfig | 1 -
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 48 ++++-----
drivers/media/common/videobuf2/Kconfig | 1 -
drivers/media/common/videobuf2/Makefile | 1 +
.../media/common/videobuf2}/frame_vector.c | 54 ++++------
drivers/media/platform/omap/Kconfig | 1 -
drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +-
drivers/misc/habanalabs/Kconfig | 1 -
drivers/misc/habanalabs/common/habanalabs.h | 6 +-
drivers/misc/habanalabs/common/memory.c | 50 ++++-----
drivers/pci/pci-sysfs.c | 4 +
drivers/pci/proc.c | 6 ++
drivers/vfio/vfio_iommu_type1.c | 4 +-
fs/sysfs/file.c | 11 ++
include/linux/ioport.h | 6 +-
include/linux/mm.h | 47 +-------
include/linux/sysfs.h | 2 +
include/media/frame_vector.h | 47 ++++++++
include/media/videobuf2-core.h | 1 +
kernel/resource.c | 101 +++++++++++++++++-
mm/Kconfig | 3 -
mm/Makefile | 1 -
mm/memory.c | 76 ++++++++++++-
mm/nommu.c | 17 +++
security/Kconfig | 13 +++
27 files changed, 403 insertions(+), 285 deletions(-)
rename {mm => drivers/media/common/videobuf2}/frame_vector.c (85%)
create mode 100644 include/media/frame_vector.h

--
2.28.0


2020-10-21 13:12:51

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 02/16] drm/exynos: Use FOLL_LONGTERM for g2d cmdlists

The exynos g2d interface is very unusual, but it looks like the
userptr objects are persistent. Hence they need FOLL_LONGTERM.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Kukjin Kim <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index ecede41af9b9..1e0c5a7f206e 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -475,7 +475,8 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
goto err_free;
}

- ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ ret = pin_user_pages_fast(start, npages,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
g2d_userptr->pages);
if (ret != npages) {
DRM_DEV_ERROR(g2d->dev,
--
2.28.0

2020-10-21 13:14:57

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 03/16] misc/habana: Stop using frame_vector helpers

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Reviewed-by: John Hubbard <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Oded Gabbay <[email protected]>
Cc: Omer Shpigelman <[email protected]>
Cc: Ofir Bitton <[email protected]>
Cc: Tomer Tayar <[email protected]>
Cc: Moti Haimovski <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Pawel Piskorski <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
--
v2: Use unpin_user_pages_dirty_lock (John)
v3: Update kerneldoc (Oded)
---
drivers/misc/habanalabs/Kconfig | 1 -
drivers/misc/habanalabs/common/habanalabs.h | 6 ++-
drivers/misc/habanalabs/common/memory.c | 49 ++++++++-------------
3 files changed, 22 insertions(+), 34 deletions(-)

diff --git a/drivers/misc/habanalabs/Kconfig b/drivers/misc/habanalabs/Kconfig
index 8eb5d38c618e..2f04187f7167 100644
--- a/drivers/misc/habanalabs/Kconfig
+++ b/drivers/misc/habanalabs/Kconfig
@@ -6,7 +6,6 @@
config HABANA_AI
tristate "HabanaAI accelerators (habanalabs)"
depends on PCI && HAS_IOMEM
- select FRAME_VECTOR
select DMA_SHARED_BUFFER
select GENERIC_ALLOCATOR
select HWMON
diff --git a/drivers/misc/habanalabs/common/habanalabs.h b/drivers/misc/habanalabs/common/habanalabs.h
index edbd627b29d2..41af090b3e6a 100644
--- a/drivers/misc/habanalabs/common/habanalabs.h
+++ b/drivers/misc/habanalabs/common/habanalabs.h
@@ -870,7 +870,8 @@ struct hl_ctx_mgr {
* struct hl_userptr - memory mapping chunk information
* @vm_type: type of the VM.
* @job_node: linked-list node for hanging the object on the Job's list.
- * @vec: pointer to the frame vector.
+ * @pages: pointer to struct page array
+ * @npages: size of @pages array
* @sgt: pointer to the scatter-gather table that holds the pages.
* @dir: for DMA unmapping, the direction must be supplied, so save it.
* @debugfs_list: node in debugfs list of command submissions.
@@ -881,7 +882,8 @@ struct hl_ctx_mgr {
struct hl_userptr {
enum vm_type_t vm_type; /* must be first */
struct list_head job_node;
- struct frame_vector *vec;
+ struct page **pages;
+ unsigned int npages;
struct sg_table *sgt;
enum dma_data_direction dir;
struct list_head debugfs_list;
diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 5ff4688683fd..327b64479f97 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1281,45 +1281,41 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
return -EFAULT;
}

- userptr->vec = frame_vector_create(npages);
- if (!userptr->vec) {
+ userptr->pages = kvmalloc_array(npages, sizeof(*userptr->pages),
+ GFP_KERNEL);
+ if (!userptr->pages) {
dev_err(hdev->dev, "Failed to create frame vector\n");
return -ENOMEM;
}

- rc = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
- userptr->vec);
+ rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ userptr->pages);

if (rc != npages) {
dev_err(hdev->dev,
"Failed to map host memory, user ptr probably wrong\n");
if (rc < 0)
- goto destroy_framevec;
+ goto destroy_pages;
+ npages = rc;
rc = -EFAULT;
- goto put_framevec;
- }
-
- if (frame_vector_to_pages(userptr->vec) < 0) {
- dev_err(hdev->dev,
- "Failed to translate frame vector to pages\n");
- rc = -EFAULT;
- goto put_framevec;
+ goto put_pages;
}
+ userptr->npages = npages;

rc = sg_alloc_table_from_pages(userptr->sgt,
- frame_vector_pages(userptr->vec),
- npages, offset, size, GFP_ATOMIC);
+ userptr->pages,
+ npages, offset, size, GFP_ATOMIC);
if (rc < 0) {
dev_err(hdev->dev, "failed to create SG table from pages\n");
- goto put_framevec;
+ goto put_pages;
}

return 0;

-put_framevec:
- put_vaddr_frames(userptr->vec);
-destroy_framevec:
- frame_vector_destroy(userptr->vec);
+put_pages:
+ unpin_user_pages(userptr->pages, npages);
+destroy_pages:
+ kvfree(userptr->pages);
return rc;
}

@@ -1405,8 +1401,6 @@ int hl_pin_host_memory(struct hl_device *hdev, u64 addr, u64 size,
*/
void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
{
- struct page **pages;
-
hl_debugfs_remove_userptr(hdev, userptr);

if (userptr->dma_mapped)
@@ -1414,15 +1408,8 @@ void hl_unpin_host_memory(struct hl_device *hdev, struct hl_userptr *userptr)
userptr->sgt->nents,
userptr->dir);

- pages = frame_vector_pages(userptr->vec);
- if (!IS_ERR(pages)) {
- int i;
-
- for (i = 0; i < frame_vector_count(userptr->vec); i++)
- set_page_dirty_lock(pages[i]);
- }
- put_vaddr_frames(userptr->vec);
- frame_vector_destroy(userptr->vec);
+ unpin_user_pages_dirty_lock(userptr->pages, userptr->npages, true);
+ kvfree(userptr->pages);

list_del(&userptr->job_node);

--
2.28.0

2020-10-21 19:00:43

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 04/16] misc/habana: Use FOLL_LONGTERM for userptr

These are persistent, not just for the duration of a dma operation.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Oded Gabbay <[email protected]>
Cc: Omer Shpigelman <[email protected]>
Cc: Ofir Bitton <[email protected]>
Cc: Tomer Tayar <[email protected]>
Cc: Moti Haimovski <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Pawel Piskorski <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/misc/habanalabs/common/memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
index 327b64479f97..767d3644c033 100644
--- a/drivers/misc/habanalabs/common/memory.c
+++ b/drivers/misc/habanalabs/common/memory.c
@@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
return -ENOMEM;
}

- rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ rc = pin_user_pages_fast(start, npages,
+ FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
userptr->pages);

if (rc != npages) {
--
2.28.0

2020-10-21 19:01:54

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 06/16] media: videobuf2: Move frame_vector into media subsystem

It's the only user. This also garbage collects the CONFIG_FRAME_VECTOR
symbol from all over the tree (well just one place, somehow omap media
driver still had this in its Kconfig, despite not using it).

Reviewed-by: John Hubbard <[email protected]>
Acked-by: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Pawel Osciak <[email protected]>
Cc: Marek Szyprowski <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Tomasz Figa <[email protected]>
Cc: Mauro Carvalho Chehab <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Daniel Vetter <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
--
v3:
- Create a new frame_vector.h header for this (Mauro)
---
drivers/media/common/videobuf2/Kconfig | 1 -
drivers/media/common/videobuf2/Makefile | 1 +
.../media/common/videobuf2}/frame_vector.c | 2 +
drivers/media/platform/omap/Kconfig | 1 -
include/linux/mm.h | 42 -----------------
include/media/frame_vector.h | 47 +++++++++++++++++++
include/media/videobuf2-core.h | 1 +
mm/Kconfig | 3 --
mm/Makefile | 1 -
9 files changed, 51 insertions(+), 48 deletions(-)
rename {mm => drivers/media/common/videobuf2}/frame_vector.c (99%)
create mode 100644 include/media/frame_vector.h

diff --git a/drivers/media/common/videobuf2/Kconfig b/drivers/media/common/videobuf2/Kconfig
index edbc99ebba87..d2223a12c95f 100644
--- a/drivers/media/common/videobuf2/Kconfig
+++ b/drivers/media/common/videobuf2/Kconfig
@@ -9,7 +9,6 @@ config VIDEOBUF2_V4L2

config VIDEOBUF2_MEMOPS
tristate
- select FRAME_VECTOR

config VIDEOBUF2_DMA_CONTIG
tristate
diff --git a/drivers/media/common/videobuf2/Makefile b/drivers/media/common/videobuf2/Makefile
index 77bebe8b202f..54306f8d096c 100644
--- a/drivers/media/common/videobuf2/Makefile
+++ b/drivers/media/common/videobuf2/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
videobuf2-common-objs := videobuf2-core.o
+videobuf2-common-objs += frame_vector.o

ifeq ($(CONFIG_TRACEPOINTS),y)
videobuf2-common-objs += vb2-trace.o
diff --git a/mm/frame_vector.c b/drivers/media/common/videobuf2/frame_vector.c
similarity index 99%
rename from mm/frame_vector.c
rename to drivers/media/common/videobuf2/frame_vector.c
index d44779e56313..6590987c14bd 100644
--- a/mm/frame_vector.c
+++ b/drivers/media/common/videobuf2/frame_vector.c
@@ -8,6 +8,8 @@
#include <linux/pagemap.h>
#include <linux/sched.h>

+#include <media/frame_vector.h>
+
/**
* get_vaddr_frames() - map virtual addresses to pfns
* @start: starting user address
diff --git a/drivers/media/platform/omap/Kconfig b/drivers/media/platform/omap/Kconfig
index f73b5893220d..de16de46c0f4 100644
--- a/drivers/media/platform/omap/Kconfig
+++ b/drivers/media/platform/omap/Kconfig
@@ -12,6 +12,5 @@ config VIDEO_OMAP2_VOUT
depends on VIDEO_V4L2
select VIDEOBUF2_DMA_CONTIG
select OMAP2_VRFB if ARCH_OMAP2 || ARCH_OMAP3
- select FRAME_VECTOR
help
V4L2 Display driver support for OMAP2/3 based boards.
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 16b799a0522c..acd60fbf1a5a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1743,48 +1743,6 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc);
int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc,
struct task_struct *task, bool bypass_rlim);

-/* Container for pinned pfns / pages */
-struct frame_vector {
- unsigned int nr_allocated; /* Number of frames we have space for */
- unsigned int nr_frames; /* Number of frames stored in ptrs array */
- bool got_ref; /* Did we pin pages by getting page ref? */
- bool is_pfns; /* Does array contain pages or pfns? */
- void *ptrs[]; /* Array of pinned pfns / pages. Use
- * pfns_vector_pages() or pfns_vector_pfns()
- * for access */
-};
-
-struct frame_vector *frame_vector_create(unsigned int nr_frames);
-void frame_vector_destroy(struct frame_vector *vec);
-int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
- unsigned int gup_flags, struct frame_vector *vec);
-void put_vaddr_frames(struct frame_vector *vec);
-int frame_vector_to_pages(struct frame_vector *vec);
-void frame_vector_to_pfns(struct frame_vector *vec);
-
-static inline unsigned int frame_vector_count(struct frame_vector *vec)
-{
- return vec->nr_frames;
-}
-
-static inline struct page **frame_vector_pages(struct frame_vector *vec)
-{
- if (vec->is_pfns) {
- int err = frame_vector_to_pages(vec);
-
- if (err)
- return ERR_PTR(err);
- }
- return (struct page **)(vec->ptrs);
-}
-
-static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
-{
- if (!vec->is_pfns)
- frame_vector_to_pfns(vec);
- return (unsigned long *)(vec->ptrs);
-}
-
struct kvec;
int get_kernel_pages(const struct kvec *iov, int nr_pages, int write,
struct page **pages);
diff --git a/include/media/frame_vector.h b/include/media/frame_vector.h
new file mode 100644
index 000000000000..1ed0cd64510d
--- /dev/null
+++ b/include/media/frame_vector.h
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: GPL-2.0
+#ifndef _MEDIA_FRAME_VECTOR_H
+#define _MEDIA_FRAME_VECTOR_H
+
+/* Container for pinned pfns / pages in frame_vector.c */
+struct frame_vector {
+ unsigned int nr_allocated; /* Number of frames we have space for */
+ unsigned int nr_frames; /* Number of frames stored in ptrs array */
+ bool got_ref; /* Did we pin pages by getting page ref? */
+ bool is_pfns; /* Does array contain pages or pfns? */
+ void *ptrs[]; /* Array of pinned pfns / pages. Use
+ * pfns_vector_pages() or pfns_vector_pfns()
+ * for access */
+};
+
+struct frame_vector *frame_vector_create(unsigned int nr_frames);
+void frame_vector_destroy(struct frame_vector *vec);
+int get_vaddr_frames(unsigned long start, unsigned int nr_pfns,
+ unsigned int gup_flags, struct frame_vector *vec);
+void put_vaddr_frames(struct frame_vector *vec);
+int frame_vector_to_pages(struct frame_vector *vec);
+void frame_vector_to_pfns(struct frame_vector *vec);
+
+static inline unsigned int frame_vector_count(struct frame_vector *vec)
+{
+ return vec->nr_frames;
+}
+
+static inline struct page **frame_vector_pages(struct frame_vector *vec)
+{
+ if (vec->is_pfns) {
+ int err = frame_vector_to_pages(vec);
+
+ if (err)
+ return ERR_PTR(err);
+ }
+ return (struct page **)(vec->ptrs);
+}
+
+static inline unsigned long *frame_vector_pfns(struct frame_vector *vec)
+{
+ if (!vec->is_pfns)
+ frame_vector_to_pfns(vec);
+ return (unsigned long *)(vec->ptrs);
+}
+
+#endif /* _MEDIA_FRAME_VECTOR_H */
diff --git a/include/media/videobuf2-core.h b/include/media/videobuf2-core.h
index bbb3f26fbde9..d045e3a5a1d8 100644
--- a/include/media/videobuf2-core.h
+++ b/include/media/videobuf2-core.h
@@ -18,6 +18,7 @@
#include <linux/dma-buf.h>
#include <linux/bitops.h>
#include <media/media-request.h>
+#include <media/frame_vector.h>

#define VB2_MAX_FRAME (32)
#define VB2_MAX_PLANES (8)
diff --git a/mm/Kconfig b/mm/Kconfig
index 6c974888f86f..da6c943fe9f1 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -815,9 +815,6 @@ config DEVICE_PRIVATE
memory; i.e., memory that is only accessible from the device (or
group of devices). You likely also want to select HMM_MIRROR.

-config FRAME_VECTOR
- bool
-
config ARCH_USES_HIGH_VMA_FLAGS
bool
config ARCH_HAS_PKEYS
diff --git a/mm/Makefile b/mm/Makefile
index d5649f1c12c0..a025fd6c6afd 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -111,7 +111,6 @@ obj-$(CONFIG_PAGE_EXTENSION) += page_ext.o
obj-$(CONFIG_CMA_DEBUGFS) += cma_debug.o
obj-$(CONFIG_USERFAULTFD) += userfaultfd.o
obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
-obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
--
2.28.0

2020-10-21 19:02:56

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 09/16] mm: Add unsafe_follow_pfn

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see 3234ac664a87
("/dev/mem: Revoke mappings when a driver claims the region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea.

Unfortunately there's some users where this is not fixable (like v4l
userptr of iomem mappings) or involves a pile of work (vfio type1
iommu). For now annotate these as unsafe and splat appropriately.

This patch adds an unsafe_follow_pfn, which later patches will then
roll out to all appropriate places.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
include/linux/mm.h | 2 ++
mm/memory.c | 32 +++++++++++++++++++++++++++++++-
mm/nommu.c | 17 +++++++++++++++++
security/Kconfig | 13 +++++++++++++
4 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2a16631c1fda..ec8c90928fc9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1653,6 +1653,8 @@ int follow_pte_pmd(struct mm_struct *mm, unsigned long address,
pte_t **ptepp, pmd_t **pmdpp, spinlock_t **ptlp);
int follow_pfn(struct vm_area_struct *vma, unsigned long address,
unsigned long *pfn);
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+ unsigned long *pfn);
int follow_phys(struct vm_area_struct *vma, unsigned long address,
unsigned int flags, unsigned long *prot, resource_size_t *phys);
int generic_access_phys(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index fff817608eb4..bcba4e8f1501 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4788,7 +4788,12 @@ EXPORT_SYMBOL(follow_pte_pmd);
* @address: user virtual address
* @pfn: location to store found PFN
*
- * Only IO mappings and raw PFN mappings are allowed.
+ * Only IO mappings and raw PFN mappings are allowed. Note that callers must
+ * ensure coherency with pte updates by using a &mmu_notifier to follow updates.
+ * If this is not feasible, or the access to the @pfn is only very short term,
+ * use follow_pte_pmd() instead and hold the pagetable lock for the duration of
+ * the access instead. Any caller not following these requirements must use
+ * unsafe_follow_pfn() instead.
*
* Return: zero and the pfn at @pfn on success, -ve otherwise.
*/
@@ -4811,6 +4816,31 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
}
EXPORT_SYMBOL(follow_pfn);

+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+ unsigned long *pfn)
+{
+#ifdef CONFIG_STRICT_FOLLOW_PFN
+ pr_info("unsafe follow_pfn usage rejected, see CONFIG_STRICT_FOLLOW_PFN\n");
+ return -EINVAL;
+#else
+ WARN_ONCE(1, "unsafe follow_pfn usage\n");
+ add_taint(TAINT_USER, LOCKDEP_STILL_OK);
+
+ return follow_pfn(vma, address, pfn);
+#endif
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
#ifdef CONFIG_HAVE_IOREMAP_PROT
int follow_phys(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
diff --git a/mm/nommu.c b/mm/nommu.c
index 75a327149af1..3db2910f0d64 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -132,6 +132,23 @@ int follow_pfn(struct vm_area_struct *vma, unsigned long address,
}
EXPORT_SYMBOL(follow_pfn);

+/**
+ * unsafe_follow_pfn - look up PFN at a user virtual address
+ * @vma: memory mapping
+ * @address: user virtual address
+ * @pfn: location to store found PFN
+ *
+ * Only IO mappings and raw PFN mappings are allowed.
+ *
+ * Returns zero and the pfn at @pfn on success, -ve otherwise.
+ */
+int unsafe_follow_pfn(struct vm_area_struct *vma, unsigned long address,
+ unsigned long *pfn)
+{
+ return follow_pfn(vma, address, pfn);
+}
+EXPORT_SYMBOL(unsafe_follow_pfn);
+
LIST_HEAD(vmap_area_list);

void vfree(const void *addr)
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f99f1d..48945402e103 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -230,6 +230,19 @@ config STATIC_USERMODEHELPER_PATH
If you wish for all usermode helper programs to be disabled,
specify an empty string here (i.e. "").

+config STRICT_FOLLOW_PFN
+ bool "Disable unsafe use of follow_pfn"
+ depends on MMU
+ help
+ Some functionality in the kernel follows userspace mappings to iomem
+ ranges in an unsafe matter. Examples include v4l userptr for zero-copy
+ buffers sharing.
+
+ If this option is switched on, such access is rejected. Only enable
+ this option when you must run userspace which requires this.
+
+ If in doubt, say Y.
+
source "security/selinux/Kconfig"
source "security/smack/Kconfig"
source "security/tomoyo/Kconfig"
--
2.28.0

2020-10-21 19:05:56

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 13/16] /dev/mem: Only set filp->f_mapping

When we care about pagecache maintenance, we need to make sure that
both f_mapping and i_mapping point at the right mapping.

But for iomem mappings we only care about the virtual/pte side of
things, so f_mapping is enough. Also setting inode->i_mapping was
confusing me as a driver maintainer, since in e.g. drivers/gpu we
don't do that. Per Dan this seems to be copypasta from places which do
care about pagecache consistency, but not needed. Hence remove it for
slightly less confusion.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/char/mem.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/char/mem.c b/drivers/char/mem.c
index abd4ffdc8cde..5502f56f3655 100644
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -864,7 +864,6 @@ static int open_port(struct inode *inode, struct file *filp)
* revocations when drivers want to take over a /dev/mem mapped
* range.
*/
- inode->i_mapping = devmem_inode->i_mapping;
filp->f_mapping = inode->i_mapping;

return 0;
--
2.28.0

2020-10-21 21:50:28

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 01/16] drm/exynos: Stop using frame_vector helpers

All we need are a pages array, pin_user_pages_fast can give us that
directly. Plus this avoids the entire raw pfn side of get_vaddr_frames.

Reviewed-by: John Hubbard <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Inki Dae <[email protected]>
Cc: Joonyoung Shim <[email protected]>
Cc: Seung-Woo Kim <[email protected]>
Cc: Kyungmin Park <[email protected]>
Cc: Kukjin Kim <[email protected]>
Cc: Krzysztof Kozlowski <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
--
v2: Use unpin_user_pages_dirty_lock (John)
---
drivers/gpu/drm/exynos/Kconfig | 1 -
drivers/gpu/drm/exynos/exynos_drm_g2d.c | 47 +++++++++++--------------
2 files changed, 20 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 6417f374b923..43257ef3c09d 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -88,7 +88,6 @@ comment "Sub-drivers"
config DRM_EXYNOS_G2D
bool "G2D"
depends on VIDEO_SAMSUNG_S5P_G2D=n || COMPILE_TEST
- select FRAME_VECTOR
help
Choose this option if you want to use Exynos G2D for DRM.

diff --git a/drivers/gpu/drm/exynos/exynos_drm_g2d.c b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
index 967a5cdc120e..ecede41af9b9 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_g2d.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_g2d.c
@@ -205,7 +205,8 @@ struct g2d_cmdlist_userptr {
dma_addr_t dma_addr;
unsigned long userptr;
unsigned long size;
- struct frame_vector *vec;
+ struct page **pages;
+ unsigned int npages;
struct sg_table *sgt;
atomic_t refcount;
bool in_pool;
@@ -378,7 +379,6 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
bool force)
{
struct g2d_cmdlist_userptr *g2d_userptr = obj;
- struct page **pages;

if (!obj)
return;
@@ -398,15 +398,9 @@ static void g2d_userptr_put_dma_addr(struct g2d_data *g2d,
dma_unmap_sgtable(to_dma_dev(g2d->drm_dev), g2d_userptr->sgt,
DMA_BIDIRECTIONAL, 0);

- pages = frame_vector_pages(g2d_userptr->vec);
- if (!IS_ERR(pages)) {
- int i;
-
- for (i = 0; i < frame_vector_count(g2d_userptr->vec); i++)
- set_page_dirty_lock(pages[i]);
- }
- put_vaddr_frames(g2d_userptr->vec);
- frame_vector_destroy(g2d_userptr->vec);
+ unpin_user_pages_dirty_lock(g2d_userptr->pages, g2d_userptr->npages,
+ true);
+ kvfree(g2d_userptr->pages);

if (!g2d_userptr->out_of_list)
list_del_init(&g2d_userptr->list);
@@ -474,35 +468,34 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
offset = userptr & ~PAGE_MASK;
end = PAGE_ALIGN(userptr + size);
npages = (end - start) >> PAGE_SHIFT;
- g2d_userptr->vec = frame_vector_create(npages);
- if (!g2d_userptr->vec) {
+ g2d_userptr->pages = kvmalloc_array(npages, sizeof(*g2d_userptr->pages),
+ GFP_KERNEL);
+ if (!g2d_userptr->pages) {
ret = -ENOMEM;
goto err_free;
}

- ret = get_vaddr_frames(start, npages, FOLL_FORCE | FOLL_WRITE,
- g2d_userptr->vec);
+ ret = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
+ g2d_userptr->pages);
if (ret != npages) {
DRM_DEV_ERROR(g2d->dev,
"failed to get user pages from userptr.\n");
if (ret < 0)
- goto err_destroy_framevec;
- ret = -EFAULT;
- goto err_put_framevec;
- }
- if (frame_vector_to_pages(g2d_userptr->vec) < 0) {
+ goto err_destroy_pages;
+ npages = ret;
ret = -EFAULT;
- goto err_put_framevec;
+ goto err_unpin_pages;
}
+ g2d_userptr->npages = npages;

sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
if (!sgt) {
ret = -ENOMEM;
- goto err_put_framevec;
+ goto err_unpin_pages;
}

ret = sg_alloc_table_from_pages(sgt,
- frame_vector_pages(g2d_userptr->vec),
+ g2d_userptr->pages,
npages, offset, size, GFP_KERNEL);
if (ret < 0) {
DRM_DEV_ERROR(g2d->dev, "failed to get sgt from pages.\n");
@@ -538,11 +531,11 @@ static dma_addr_t *g2d_userptr_get_dma_addr(struct g2d_data *g2d,
err_free_sgt:
kfree(sgt);

-err_put_framevec:
- put_vaddr_frames(g2d_userptr->vec);
+err_unpin_pages:
+ unpin_user_pages(g2d_userptr->pages, npages);

-err_destroy_framevec:
- frame_vector_destroy(g2d_userptr->vec);
+err_destroy_pages:
+ kvfree(g2d_userptr->pages);

err_free:
kfree(g2d_userptr);
--
2.28.0

2020-10-21 21:50:38

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 11/16] vfio/type1: Mark follow_pfn as unsafe

The code seems to stuff these pfns into iommu pts (or something like
that, I didn't follow), but there's no mmu_notifier to ensure that
access is synchronized with pte updates.

Hence mark these as unsafe. This means that with
CONFIG_STRICT_FOLLOW_PFN, these will be rejected.

Real fix is to wire up an mmu_notifier ... somehow. Probably means any
invalidate is a fatal fault for this vfio device, but then this
shouldn't ever happen if userspace is reasonable.

Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Alex Williamson <[email protected]>
Cc: Cornelia Huck <[email protected]>
Cc: [email protected]
Signed-off-by: Daniel Vetter <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5fbf0c1f7433..a4d53f3d0a35 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -421,7 +421,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
{
int ret;

- ret = follow_pfn(vma, vaddr, pfn);
+ ret = unsafe_follow_pfn(vma, vaddr, pfn);
if (ret) {
bool unlocked = false;

@@ -435,7 +435,7 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
if (ret)
return ret;

- ret = follow_pfn(vma, vaddr, pfn);
+ ret = unsafe_follow_pfn(vma, vaddr, pfn);
}

return ret;
--
2.28.0

2020-10-21 21:50:47

by Daniel Vetter

[permalink] [raw]
Subject: [PATCH v3 08/16] s390/pci: Remove races against pte updates

Way back it was a reasonable assumptions that iomem mappings never
change the pfn range they point at. But this has changed:

- gpu drivers dynamically manage their memory nowadays, invalidating
ptes with unmap_mapping_range when buffers get moved

- contiguous dma allocations have moved from dedicated carvetouts to
cma regions. This means if we miss the unmap the pfn might contain
pagecache or anon memory (well anything allocated with GFP_MOVEABLE)

- even /dev/mem now invalidates mappings when the kernel requests that
iomem region when CONFIG_IO_STRICT_DEVMEM is set, see commit
3234ac664a87 ("/dev/mem: Revoke mappings when a driver claims the
region")

Accessing pfns obtained from ptes without holding all the locks is
therefore no longer a good idea. Fix this.

Since zpci_memcpy_from|toio seems to not do anything nefarious with
locks we just need to open code get_pfn and follow_pfn and make sure
we drop the locks only after we're done. The write function also needs
the copy_from_user move, since we can't take userspace faults while
holding the mmap sem.

v2: Move VM_IO | VM_PFNMAP checks around so they keep returning EINVAL
like before (Gerard)

v3: Polish commit message (Niklas)

Reviewed-by: Gerald Schaefer <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Niklas Schnelle <[email protected]>
Cc: Gerald Schaefer <[email protected]>
Cc: [email protected]
Cc: Niklas Schnelle <[email protected]>
Signed-off-by: Daniel Vetter <[email protected]>
---
arch/s390/pci/pci_mmio.c | 98 +++++++++++++++++++++++-----------------
1 file changed, 57 insertions(+), 41 deletions(-)

diff --git a/arch/s390/pci/pci_mmio.c b/arch/s390/pci/pci_mmio.c
index 401cf670a243..1a6adbc68ee8 100644
--- a/arch/s390/pci/pci_mmio.c
+++ b/arch/s390/pci/pci_mmio.c
@@ -119,33 +119,15 @@ static inline int __memcpy_toio_inuser(void __iomem *dst,
return rc;
}

-static long get_pfn(unsigned long user_addr, unsigned long access,
- unsigned long *pfn)
-{
- struct vm_area_struct *vma;
- long ret;
-
- mmap_read_lock(current->mm);
- ret = -EINVAL;
- vma = find_vma(current->mm, user_addr);
- if (!vma)
- goto out;
- ret = -EACCES;
- if (!(vma->vm_flags & access))
- goto out;
- ret = follow_pfn(vma, user_addr, pfn);
-out:
- mmap_read_unlock(current->mm);
- return ret;
-}
-
SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
const void __user *, user_buffer, size_t, length)
{
u8 local_buf[64];
void __iomem *io_addr;
void *buf;
- unsigned long pfn;
+ struct vm_area_struct *vma;
+ pte_t *ptep;
+ spinlock_t *ptl;
long ret;

if (!zpci_is_enabled())
@@ -158,7 +140,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
* We only support write access to MIO capable devices if we are on
* a MIO enabled system. Otherwise we would have to check for every
* address if it is a special ZPCI_ADDR and would have to do
- * a get_pfn() which we don't need for MIO capable devices. Currently
+ * a pfn lookup which we don't need for MIO capable devices. Currently
* ISM devices are the only devices without MIO support and there is no
* known need for accessing these from userspace.
*/
@@ -176,21 +158,37 @@ SYSCALL_DEFINE3(s390_pci_mmio_write, unsigned long, mmio_addr,
} else
buf = local_buf;

- ret = get_pfn(mmio_addr, VM_WRITE, &pfn);
+ ret = -EFAULT;
+ if (copy_from_user(buf, user_buffer, length))
+ goto out_free;
+
+ mmap_read_lock(current->mm);
+ ret = -EINVAL;
+ vma = find_vma(current->mm, mmio_addr);
+ if (!vma)
+ goto out_unlock_mmap;
+ if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+ goto out_unlock_mmap;
+ ret = -EACCES;
+ if (!(vma->vm_flags & VM_WRITE))
+ goto out_unlock_mmap;
+
+ ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
if (ret)
- goto out;
- io_addr = (void __iomem *)((pfn << PAGE_SHIFT) |
+ goto out_unlock_mmap;
+
+ io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
(mmio_addr & ~PAGE_MASK));

- ret = -EFAULT;
if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE)
- goto out;
-
- if (copy_from_user(buf, user_buffer, length))
- goto out;
+ goto out_unlock_pt;

ret = zpci_memcpy_toio(io_addr, buf, length);
-out:
+out_unlock_pt:
+ pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+ mmap_read_unlock(current->mm);
+out_free:
if (buf != local_buf)
kfree(buf);
return ret;
@@ -274,7 +272,9 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
u8 local_buf[64];
void __iomem *io_addr;
void *buf;
- unsigned long pfn;
+ struct vm_area_struct *vma;
+ pte_t *ptep;
+ spinlock_t *ptl;
long ret;

if (!zpci_is_enabled())
@@ -287,7 +287,7 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
* We only support read access to MIO capable devices if we are on
* a MIO enabled system. Otherwise we would have to check for every
* address if it is a special ZPCI_ADDR and would have to do
- * a get_pfn() which we don't need for MIO capable devices. Currently
+ * a pfn lookup which we don't need for MIO capable devices. Currently
* ISM devices are the only devices without MIO support and there is no
* known need for accessing these from userspace.
*/
@@ -306,22 +306,38 @@ SYSCALL_DEFINE3(s390_pci_mmio_read, unsigned long, mmio_addr,
buf = local_buf;
}

- ret = get_pfn(mmio_addr, VM_READ, &pfn);
+ mmap_read_lock(current->mm);
+ ret = -EINVAL;
+ vma = find_vma(current->mm, mmio_addr);
+ if (!vma)
+ goto out_unlock_mmap;
+ if (!(vma->vm_flags & (VM_IO | VM_PFNMAP)))
+ goto out_unlock_mmap;
+ ret = -EACCES;
+ if (!(vma->vm_flags & VM_WRITE))
+ goto out_unlock_mmap;
+
+ ret = follow_pte_pmd(vma->vm_mm, mmio_addr, NULL, &ptep, NULL, &ptl);
if (ret)
- goto out;
- io_addr = (void __iomem *)((pfn << PAGE_SHIFT) | (mmio_addr & ~PAGE_MASK));
+ goto out_unlock_mmap;
+
+ io_addr = (void __iomem *)((pte_pfn(*ptep) << PAGE_SHIFT) |
+ (mmio_addr & ~PAGE_MASK));

if ((unsigned long) io_addr < ZPCI_IOMAP_ADDR_BASE) {
ret = -EFAULT;
- goto out;
+ goto out_unlock_pt;
}
ret = zpci_memcpy_fromio(buf, io_addr, length);
- if (ret)
- goto out;
- if (copy_to_user(user_buffer, buf, length))
+
+out_unlock_pt:
+ pte_unmap_unlock(ptep, ptl);
+out_unlock_mmap:
+ mmap_read_unlock(current->mm);
+
+ if (!ret && copy_to_user(user_buffer, buf, length))
ret = -EFAULT;

-out:
if (buf != local_buf)
kfree(buf);
return ret;
--
2.28.0

2020-10-22 00:42:49

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 00/16] follow_pfn and other iomap races

On Wed, Oct 21, 2020 at 10:56:39AM +0200, Daniel Vetter wrote:
> Hi all,
>
> Round 3 of my patch series to clamp down a bunch of races and gaps
> around follow_pfn and other access to iomem mmaps. Previous version:
>
> v1: https://lore.kernel.org/dri-devel/[email protected]/
> v2: https://lore.kernel.org/dri-devel/[email protected]
>
> And the discussion that sparked this journey:
>
> https://lore.kernel.org/dri-devel/[email protected]/
>
> I was waiting for the testing result for habanalabs from Oded, but I guess
> Oded was waiting for my v3.
>
> Changes in v3:
> - Bunch of polish all over, no functional changes aside from one barrier
> in the resource code, for consistency.
> - A few more r-b tags.
>
> Changes in v2:
> - tons of small polish&fixes all over, thanks to all the reviewers who
> spotted issues
> - I managed to test at least the generic_access_phys and pci mmap revoke
> stuff with a few gdb sessions using our i915 debug tools (hence now also
> the drm/i915 patch to properly request all the pci bar regions)
> - reworked approach for the pci mmap revoke: Infrastructure moved into
> kernel/resource.c, address_space mapping is now set up at open time for
> everyone (which required some sysfs changes). Does indeed look a lot
> cleaner and a lot less invasive than I feared at first.
>
> The big thing I can't test are all the frame_vector changes in habanalbas,
> exynos and media. Gerald has given the s390 patch a spin already.
>
> Review, testing, feedback all very much welcome.
>
> Daniel Vetter (16):
> drm/exynos: Stop using frame_vector helpers
> drm/exynos: Use FOLL_LONGTERM for g2d cmdlists
> misc/habana: Stop using frame_vector helpers
> misc/habana: Use FOLL_LONGTERM for userptr
> mm/frame-vector: Use FOLL_LONGTERM
> media: videobuf2: Move frame_vector into media subsystem
> mm: Close race in generic_access_phys
> s390/pci: Remove races against pte updates
> mm: Add unsafe_follow_pfn
> media/videbuf1|2: Mark follow_pfn usage as unsafe
> vfio/type1: Mark follow_pfn as unsafe
> PCI: Obey iomem restrictions for procfs mmap
> /dev/mem: Only set filp->f_mapping
> resource: Move devmem revoke code to resource framework
> sysfs: Support zapping of binary attr mmaps
> PCI: Revoke mappings like devmem

The whole thing looks like a great improvement!

Thanks,
Jason

2020-10-22 06:32:47

by Dan Williams

[permalink] [raw]
Subject: Re: [PATCH v3 13/16] /dev/mem: Only set filp->f_mapping

On Wed, Oct 21, 2020 at 1:57 AM Daniel Vetter <[email protected]> wrote:
>
> When we care about pagecache maintenance, we need to make sure that
> both f_mapping and i_mapping point at the right mapping.
>
> But for iomem mappings we only care about the virtual/pte side of
> things, so f_mapping is enough. Also setting inode->i_mapping was
> confusing me as a driver maintainer, since in e.g. drivers/gpu we
> don't do that. Per Dan this seems to be copypasta from places which do
> care about pagecache consistency, but not needed. Hence remove it for
> slightly less confusion.
>
> Signed-off-by: Daniel Vetter <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Kees Cook <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Dan Williams <[email protected]>

Reviewed-by: Dan Williams <[email protected]>

2020-10-23 09:15:18

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 04/16] misc/habana: Use FOLL_LONGTERM for userptr

Hi Oded,

Did testing on your end turn up anything, or can I put an
ack&tested-by from you on the two habana patches for the next round?

Thanks, Daniel

On Wed, Oct 21, 2020 at 10:57 AM Daniel Vetter <[email protected]> wrote:
>
> These are persistent, not just for the duration of a dma operation.
>
> Signed-off-by: Daniel Vetter <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: Jérôme Glisse <[email protected]>
> Cc: Jan Kara <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Oded Gabbay <[email protected]>
> Cc: Omer Shpigelman <[email protected]>
> Cc: Ofir Bitton <[email protected]>
> Cc: Tomer Tayar <[email protected]>
> Cc: Moti Haimovski <[email protected]>
> Cc: Daniel Vetter <[email protected]>
> Cc: Greg Kroah-Hartman <[email protected]>
> Cc: Pawel Piskorski <[email protected]>
> Signed-off-by: Daniel Vetter <[email protected]>
> ---
> drivers/misc/habanalabs/common/memory.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/misc/habanalabs/common/memory.c b/drivers/misc/habanalabs/common/memory.c
> index 327b64479f97..767d3644c033 100644
> --- a/drivers/misc/habanalabs/common/memory.c
> +++ b/drivers/misc/habanalabs/common/memory.c
> @@ -1288,7 +1288,8 @@ static int get_user_memory(struct hl_device *hdev, u64 addr, u64 size,
> return -ENOMEM;
> }
>
> - rc = pin_user_pages_fast(start, npages, FOLL_FORCE | FOLL_WRITE,
> + rc = pin_user_pages_fast(start, npages,
> + FOLL_FORCE | FOLL_WRITE | FOLL_LONGTERM,
> userptr->pages);
>
> if (rc != npages) {
> --
> 2.28.0
>


--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch