2019-11-12 00:12:19

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

Hi,

The cover letter is long, so the more important stuff is first:

* Jason, if you or someone could look at the the VFIO cleanup (patch 8)
and conversion to FOLL_PIN (patch 18), to make sure it's use of
remote and longterm gup matches what we discussed during the review
of v2, I'd appreciate it.

* Also for Jason and IB: as noted below, in patch 11, I am (too?) boldly
converting from put_user_pages() to release_pages().

* Jerome, I am going to take a look at doing your FOLL_GET change idea
(some callers should set FOLL_GET) separately, because it blew up "a
little bit" in my face. It's definitely a separate--tiny, but risky--project.
It also looks more reasonable when applied on top of this series here
(and it conflicts a lot), so I'm going to send it as a follow-up.

Changes since v2:

* Added a patch to convert IB/umem from normal gup, to gup_fast(). This
is also posted separately, in order to hopefully get some runtime
testing.

* Changed the page devmap code to be a little clearer,
thanks to Jerome for that.

* Split out the page devmap changes into a separate patch (and moved
Ira's Signed-off-by to that patch).

* Fixed my bug in IB: ODP code does not require pin_user_pages()
semantics. Therefore, revert the put_user_page() calls to put_page(),
and leave the get_user_pages() call as-is.

* As part of the revert, I am proposing here a change directly
from put_user_pages(), to release_pages(). I'd feel better if
someone agrees that this is the best way. It uses the more
efficient release_pages(), instead of put_page() in a loop,
and keep the change to just a few character on one line,
but OTOH it is not a pure revert.

* Loosened the FOLL_LONGTERM restrictions in the
__get_user_pages_locked() implementation, and used that in order
to fix up a VFIO bug. Thanks to Jason for that idea.

* Note the use of release_pages() in IB: is that OK?

* Added a few more WARN's and clarifying comments nearby.

* Many documentation improvements in various comments.

* Moved the new pin_user_pages.rst from Documentation/vm/ to
Documentation/core-api/ .

* Commit descriptions: added clarifying notes to the three patches
(drm/via, fs/io_uring, net/xdp) that already had put_user_page()
calls in place.

* Collected all pending Reviewed-by and Acked-by tags, from v1 and v2
email threads.

* Lot of churn from v2 --> v3, so it's possible that new bugs
sneaked in.

NOT DONE: separate patchset is required:

* __get_user_pages_locked(): stop compensating for
buggy callers who failed to set FOLL_GET. Instead, assert
that FOLL_GET is set (and fail if it's not).

======================================================================
Original cover letter (edited to fix up the patch description numbers)

This applies cleanly to linux-next and mmotm, and also to linux.git if
linux-next's commit 20cac10710c9 ("mm/gup_benchmark: fix MAP_HUGETLB
case") is first applied there.

This provides tracking of dma-pinned pages. This is a prerequisite to
solving the larger problem of proper interactions between file-backed
pages, and [R]DMA activities, as discussed in [1], [2], [3], and in
a remarkable number of email threads since about 2017. :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:

get_user_pages() (sets FOLL_GET)
put_page()

to this:
pin_user_pages() (sets FOLL_PIN)
put_user_page()

Because there are interdependencies with FOLL_LONGTERM, a similar
conversion as for FOLL_PIN, was applied. The change was from this:

get_user_pages(FOLL_LONGTERM) (also sets FOLL_GET)
put_page()

to this:
pin_longterm_pages() (sets FOLL_PIN | FOLL_LONGTERM)
put_user_page()

============================================================
Patch summary:

* Patches 1-8: refactoring and preparatory cleanup, independent fixes

* Patch 9: introduce pin_user_pages(), FOLL_PIN, but no functional
changes yet
* Patches 10-15: Convert existing put_user_page() callers, to use the
new pin*()
* Patch 16: Activate tracking of FOLL_PIN pages.
* Patches 17-19: convert FOLL_LONGTERM callers
* Patches: 20-22: gup_benchmark and run_vmtests support
* Patch 23: enforce FOLL_LONGTERM as a gup-internal (only) flag

============================================================
Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've been
able to runtime test the core get_user_pages() and pin_user_pages() and
related routines, but not so much on several of the call sites--but those
are generally just a couple of lines changed, each.

Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.

Also, my runtime testing for the call sites so far is very weak:

* io_uring: Some directed tests from liburing exercise this, and they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and passes.
* infiniband (still only have crude "IB pingpong" working, on a
good day: it's not exercising my conversions at runtime...)
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...

============================================================
Next:

* Get the block/bio_vec sites converted to use pin_user_pages().

* Work with Ira and Dave Chinner to weave this together with the
layout lease stuff.

============================================================

[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/

John Hubbard (23):
mm/gup: pass flags arg to __gup_device_* functions
mm/gup: factor out duplicate code from four routines
mm/gup: move try_get_compound_head() to top, fix minor issues
mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages
goldish_pipe: rename local pin_user_pages() routine
IB/umem: use get_user_pages_fast() to pin DMA pages
media/v4l2-core: set pages dirty upon releasing DMA buffers
vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
IB/{core,hw,umem}: set FOLL_PIN, FOLL_LONGTERM via
pin_longterm_pages*()
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
mm/gup: track FOLL_PIN pages
media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_longterm_pages() and put_user_page()
mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding
"1"
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
mm/gup: remove support for gup(FOLL_LONGTERM)

Documentation/core-api/index.rst | 1 +
Documentation/core-api/pin_user_pages.rst | 218 +++++++
arch/powerpc/mm/book3s64/iommu_api.c | 15 +-
drivers/gpu/drm/via/via_dmablit.c | 2 +-
drivers/infiniband/core/umem.c | 17 +-
drivers/infiniband/core/umem_odp.c | 24 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 3 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 8 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +-
drivers/infiniband/sw/siw/siw_mem.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 10 +-
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 35 +-
fs/io_uring.c | 5 +-
include/linux/mm.h | 164 +++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 608 ++++++++++++++++----
mm/gup_benchmark.c | 87 ++-
mm/huge_memory.c | 54 +-
mm/hugetlb.c | 39 +-
mm/memremap.c | 67 +--
mm/process_vm_access.c | 28 +-
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 34 +-
tools/testing/selftests/vm/run_vmtests | 22 +
29 files changed, 1180 insertions(+), 334 deletions(-)
create mode 100644 Documentation/core-api/pin_user_pages.rst

--
2.24.0


2019-11-12 00:12:30

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 05/23] goldish_pipe: rename local pin_user_pages() routine

1. Avoid naming conflicts: rename local static function from
"pin_user_pages()" to "pin_goldfish_pages()".

An upcoming patch will introduce a global pin_user_pages()
function.

Reviewed-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
drivers/platform/goldfish/goldfish_pipe.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c
index cef0133aa47a..7ed2a21a0bac 100644
--- a/drivers/platform/goldfish/goldfish_pipe.c
+++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -257,12 +257,12 @@ static int goldfish_pipe_error_convert(int status)
}
}

-static int pin_user_pages(unsigned long first_page,
- unsigned long last_page,
- unsigned int last_page_size,
- int is_write,
- struct page *pages[MAX_BUFFERS_PER_COMMAND],
- unsigned int *iter_last_page_size)
+static int pin_goldfish_pages(unsigned long first_page,
+ unsigned long last_page,
+ unsigned int last_page_size,
+ int is_write,
+ struct page *pages[MAX_BUFFERS_PER_COMMAND],
+ unsigned int *iter_last_page_size)
{
int ret;
int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1;
@@ -354,9 +354,9 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe,
if (mutex_lock_interruptible(&pipe->lock))
return -ERESTARTSYS;

- pages_count = pin_user_pages(first_page, last_page,
- last_page_size, is_write,
- pipe->pages, &iter_last_page_size);
+ pages_count = pin_goldfish_pages(first_page, last_page,
+ last_page_size, is_write,
+ pipe->pages, &iter_last_page_size);
if (pages_count < 0) {
mutex_unlock(&pipe->lock);
return pages_count;
--
2.24.0

2019-11-12 00:12:34

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 06/23] IB/umem: use get_user_pages_fast() to pin DMA pages

And get rid of the mmap_sem calls, as part of that. Note
that get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Reviewed-by: Ira Weiny <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
drivers/infiniband/core/umem.c | 17 ++++++-----------
1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 24244a2f68cc..3d664a2539eb 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -271,16 +271,13 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
sg = umem->sg_head.sgl;

while (npages) {
- down_read(&mm->mmap_sem);
- ret = get_user_pages(cur_base,
- min_t(unsigned long, npages,
- PAGE_SIZE / sizeof (struct page *)),
- gup_flags | FOLL_LONGTERM,
- page_list, NULL);
- if (ret < 0) {
- up_read(&mm->mmap_sem);
+ ret = get_user_pages_fast(cur_base,
+ min_t(unsigned long, npages,
+ PAGE_SIZE /
+ sizeof(struct page *)),
+ gup_flags | FOLL_LONGTERM, page_list);
+ if (ret < 0)
goto umem_release;
- }

cur_base += ret * PAGE_SIZE;
npages -= ret;
@@ -288,8 +285,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
sg = ib_umem_add_sg_table(sg, page_list, ret,
dma_get_max_seg_size(context->device->dma_device),
&umem->sg_nents);
-
- up_read(&mm->mmap_sem);
}

sg_mark_end(sg);
--
2.24.0

2019-11-12 00:12:56

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 02/23] mm/gup: factor out duplicate code from four routines

There are four locations in gup.c that have a fair amount of code
duplication. This means that changing one requires making the same
changes in four places, not to mention reading the same code four
times, and wondering if there are subtle differences.

Factor out the common code into static functions, thus reducing the
overall line count and the code's complexity.

Also, take the opportunity to slightly improve the efficiency of the
error cases, by doing a mass subtraction of the refcount, surrounded
by get_page()/put_page().

Also, further simplify (slightly), by waiting until the the successful
end of each routine, to increment *nr.

Reviewed-by: Jérôme Glisse <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
mm/gup.c | 104 ++++++++++++++++++++++++-------------------------------
1 file changed, 45 insertions(+), 59 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 85caf76b3012..199da99e8ffc 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1969,6 +1969,34 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
}
#endif

+static int __record_subpages(struct page *page, unsigned long addr,
+ unsigned long end, struct page **pages, int nr)
+{
+ int nr_recorded_pages = 0;
+
+ do {
+ pages[nr] = page;
+ nr++;
+ page++;
+ nr_recorded_pages++;
+ } while (addr += PAGE_SIZE, addr != end);
+ return nr_recorded_pages;
+}
+
+static void put_compound_head(struct page *page, int refs)
+{
+ /* Do a get_page() first, in case refs == page->_refcount */
+ get_page(page);
+ page_ref_sub(page, refs);
+ put_page(page);
+}
+
+static void __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr)
+{
+ *nr += nr_recorded_pages;
+ SetPageReferenced(head);
+}
+
#ifdef CONFIG_ARCH_HAS_HUGEPD
static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
unsigned long sz)
@@ -1998,33 +2026,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
/* hugepages are never "special" */
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));

- refs = 0;
head = pte_page(pte);
-
page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
- do {
- VM_BUG_ON(compound_head(page) != head);
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(head, refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pte_val(pte) != pte_val(*ptep))) {
- /* Could be optimized better */
- *nr -= refs;
- while (refs--)
- put_page(head);
+ put_compound_head(head, refs);
return 0;
}

- SetPageReferenced(head);
+ __huge_pt_done(head, refs, nr);
return 1;
}

@@ -2071,29 +2086,19 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
pages, nr);
}

- refs = 0;
page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
- do {
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(pmd_page(orig), refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
- *nr -= refs;
- while (refs--)
- put_page(head);
+ put_compound_head(head, refs);
return 0;
}

- SetPageReferenced(head);
+ __huge_pt_done(head, refs, nr);
return 1;
}

@@ -2114,29 +2119,19 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
pages, nr);
}

- refs = 0;
page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
- do {
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(pud_page(orig), refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pud_val(orig) != pud_val(*pudp))) {
- *nr -= refs;
- while (refs--)
- put_page(head);
+ put_compound_head(head, refs);
return 0;
}

- SetPageReferenced(head);
+ __huge_pt_done(head, refs, nr);
return 1;
}

@@ -2151,29 +2146,20 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
return 0;

BUILD_BUG_ON(pgd_devmap(orig));
- refs = 0;
+
page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
- do {
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(pgd_page(orig), refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
- *nr -= refs;
- while (refs--)
- put_page(head);
+ put_compound_head(head, refs);
return 0;
}

- SetPageReferenced(head);
+ __huge_pt_done(head, refs, nr);
return 1;
}

--
2.24.0

2019-11-12 00:13:06

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 16/23] mm/gup: track FOLL_PIN pages

Add tracking of pages that were pinned via FOLL_PIN.

As mentioned in the FOLL_PIN documentation, callers who effectively set
FOLL_PIN are required to ultimately free such pages via put_user_page().
The effect is similar to FOLL_GET, and may be thought of as "FOLL_GET
for DIO and/or RDMA use".

Pages that have been pinned via FOLL_PIN are identifiable via a
new function call:

bool page_dma_pinned(struct page *page);

What to do in response to encountering such a page, is left to later
patchsets. There is discussion about this in [1].

This also changes a BUG_ON(), to a WARN_ON(), in follow_page_mask().

Suggested-by: Jan Kara <[email protected]>
Suggested-by: Jérôme Glisse <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
include/linux/mm.h | 75 ++++++++++++----
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +++
mm/gup.c | 190 +++++++++++++++++++++++++++++++++------
mm/huge_memory.c | 54 ++++++++++-
mm/hugetlb.c | 39 +++++++-
mm/vmstat.c | 2 +
7 files changed, 322 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 11e0086d64a4..19b3fa68a4da 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1054,6 +1054,8 @@ static inline __must_check bool try_get_page(struct page *page)
return true;
}

+__must_check bool user_page_ref_inc(struct page *page);
+
static inline void put_page(struct page *page)
{
page = compound_head(page);
@@ -1071,31 +1073,70 @@ static inline void put_page(struct page *page)
__put_page(page);
}

-/**
- * put_user_page() - release a gup-pinned page
- * @page: pointer to page to be released
+/*
+ * GUP_PIN_COUNTING_BIAS, and the associated functions that use it, overload
+ * the page's refcount so that two separate items are tracked: the original page
+ * reference count, and also a new count of how many get_user_pages() calls were
+ * made against the page. ("gup-pinned" is another term for the latter).
+ *
+ * With this scheme, get_user_pages() becomes special: such pages are marked
+ * as distinct from normal pages. As such, the new put_user_page() call (and
+ * its variants) must be used in order to release gup-pinned pages.
+ *
+ * Choice of value:
*
- * Pages that were pinned via get_user_pages*() must be released via
- * either put_user_page(), or one of the put_user_pages*() routines
- * below. This is so that eventually, pages that are pinned via
- * get_user_pages*() can be separately tracked and uniquely handled. In
- * particular, interactions with RDMA and filesystems need special
- * handling.
+ * By making GUP_PIN_COUNTING_BIAS a power of two, debugging of page reference
+ * counts with respect to get_user_pages() and put_user_page() becomes simpler,
+ * due to the fact that adding an even power of two to the page refcount has
+ * the effect of using only the upper N bits, for the code that counts up using
+ * the bias value. This means that the lower bits are left for the exclusive
+ * use of the original code that increments and decrements by one (or at least,
+ * by much smaller values than the bias value).
*
- * put_user_page() and put_page() are not interchangeable, despite this early
- * implementation that makes them look the same. put_user_page() calls must
- * be perfectly matched up with get_user_page() calls.
+ * Of course, once the lower bits overflow into the upper bits (and this is
+ * OK, because subtraction recovers the original values), then visual inspection
+ * no longer suffices to directly view the separate counts. However, for normal
+ * applications that don't have huge page reference counts, this won't be an
+ * issue.
+ *
+ * Locking: the lockless algorithm described in page_cache_get_speculative()
+ * and page_cache_gup_pin_speculative() provides safe operation for
+ * get_user_pages and page_mkclean and other calls that race to set up page
+ * table entries.
*/
-static inline void put_user_page(struct page *page)
-{
- put_page(page);
-}
+#define GUP_PIN_COUNTING_BIAS (1UL << 10)

+void put_user_page(struct page *page);
void put_user_pages_dirty_lock(struct page **pages, unsigned long npages,
bool make_dirty);
-
void put_user_pages(struct page **pages, unsigned long npages);

+/**
+ * page_dma_pinned() - report if a page is pinned for DMA.
+ *
+ * This function checks if a page has been pinned via a call to
+ * pin_user_pages*() or pin_longterm_pages*().
+ *
+ * The return value is partially fuzzy: false is not fuzzy, because it means
+ * "definitely not pinned for DMA", but true means "probably pinned for DMA, but
+ * possibly a false positive due to having at least GUP_PIN_COUNTING_BIAS worth
+ * of normal page references".
+ *
+ * False positives are OK, because: a) it's unlikely for a page to get that many
+ * refcounts, and b) all the callers of this routine are expected to be able to
+ * deal gracefully with a false positive.
+ *
+ * For more information, please see Documentation/vm/pin_user_pages.rst.
+ *
+ * @page: pointer to page to be queried.
+ * @Return: True, if it is likely that the page has been "dma-pinned".
+ * False, if the page is definitely not dma-pinned.
+ */
+static inline bool page_dma_pinned(struct page *page)
+{
+ return (page_ref_count(compound_head(page))) >= GUP_PIN_COUNTING_BIAS;
+}
+
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTION_IN_PAGE_FLAGS
#endif
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index bda20282746b..0485cba38d23 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -244,6 +244,8 @@ enum node_stat_item {
NR_DIRTIED, /* page dirtyings since bootup */
NR_WRITTEN, /* page writings since bootup */
NR_KERNEL_MISC_RECLAIMABLE, /* reclaimable non-slab kernel pages */
+ NR_FOLL_PIN_REQUESTED, /* via: pin_user_page(), gup flag: FOLL_PIN */
+ NR_FOLL_PIN_RETURNED, /* pages returned via put_user_page() */
NR_VM_NODE_STAT_ITEMS
};

diff --git a/include/linux/page_ref.h b/include/linux/page_ref.h
index 14d14beb1f7f..b9cbe553d1e7 100644
--- a/include/linux/page_ref.h
+++ b/include/linux/page_ref.h
@@ -102,6 +102,16 @@ static inline void page_ref_sub(struct page *page, int nr)
__page_ref_mod(page, -nr);
}

+static inline int page_ref_sub_return(struct page *page, int nr)
+{
+ int ret = atomic_sub_return(nr, &page->_refcount);
+
+ if (page_ref_tracepoint_active(__tracepoint_page_ref_mod))
+ __page_ref_mod(page, -nr);
+
+ return ret;
+}
+
static inline void page_ref_inc(struct page *page)
{
atomic_inc(&page->_refcount);
diff --git a/mm/gup.c b/mm/gup.c
index ea31810da828..fc164c2ee6b5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,95 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
return head;
}

+#ifdef CONFIG_DEBUG_VM
+static inline void __update_proc_vmstat(struct page *page,
+ enum node_stat_item item, int count)
+{
+ mod_node_page_state(page_pgdat(page), item, count);
+}
+#else
+static inline void __update_proc_vmstat(struct page *page,
+ enum node_stat_item item, int count)
+{
+}
+#endif
+
+/**
+ * user_page_ref_inc() - mark a page as being used by get_user_pages(FOLL_PIN).
+ *
+ * @page: pointer to page to be marked
+ * @Return: true for success, false for failure
+ */
+__must_check bool user_page_ref_inc(struct page *page)
+{
+ page = try_get_compound_head(page, GUP_PIN_COUNTING_BIAS);
+ if (!page)
+ return false;
+
+ __update_proc_vmstat(page, NR_FOLL_PIN_REQUESTED, 1);
+ return true;
+}
+
+#ifdef CONFIG_DEV_PAGEMAP_OPS
+static bool __put_devmap_managed_user_page(struct page *page)
+{
+ bool is_devmap = page_is_devmap_managed(page);
+
+ if (is_devmap) {
+ int count = page_ref_sub_return(page, GUP_PIN_COUNTING_BIAS);
+
+ __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1);
+ /*
+ * devmap page refcounts are 1-based, rather than 0-based: if
+ * refcount is 1, then the page is free and the refcount is
+ * stable because nobody holds a reference on the page.
+ */
+ if (count == 1)
+ free_devmap_managed_page(page);
+ else if (!count)
+ __put_page(page);
+ }
+
+ return is_devmap;
+}
+#else
+static bool __put_devmap_managed_user_page(struct page *page)
+{
+ return false;
+}
+#endif /* CONFIG_DEV_PAGEMAP_OPS */
+
+/**
+ * put_user_page() - release a gup-pinned page
+ * @page: pointer to page to be released
+ *
+ * Pages that were pinned via get_user_pages*() must be released via
+ * either put_user_page(), or one of the put_user_pages*() routines
+ * below. This is so that eventually, pages that are pinned via
+ * get_user_pages*() can be separately tracked and uniquely handled. In
+ * particular, interactions with RDMA and filesystems need special
+ * handling.
+ */
+void put_user_page(struct page *page)
+{
+ page = compound_head(page);
+
+ /*
+ * For devmap managed pages we need to catch refcount transition from
+ * GUP_PIN_COUNTING_BIAS to 1, when refcount reach one it means the
+ * page is free and we need to inform the device driver through
+ * callback. See include/linux/memremap.h and HMM for details.
+ */
+ if (__put_devmap_managed_user_page(page))
+ return;
+
+ if (page_ref_sub_and_test(page, GUP_PIN_COUNTING_BIAS))
+ __put_page(page);
+
+ __update_proc_vmstat(page, NR_FOLL_PIN_RETURNED, 1);
+}
+EXPORT_SYMBOL(put_user_page);
+
/**
* put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
* @pages: array of pages to be maybe marked dirty, and definitely released.
@@ -230,10 +319,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
}

page = vm_normal_page(vma, address, pte);
- if (!page && pte_devmap(pte) && (flags & FOLL_GET)) {
+ if (!page && pte_devmap(pte) && (flags & (FOLL_GET | FOLL_PIN))) {
/*
- * Only return device mapping pages in the FOLL_GET case since
- * they are only valid while holding the pgmap reference.
+ * Only return device mapping pages in the FOLL_GET or FOLL_PIN
+ * case since they are only valid while holding the pgmap
+ * reference.
*/
*pgmap = get_dev_pagemap(pte_pfn(pte), *pgmap);
if (*pgmap)
@@ -276,6 +366,11 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
page = ERR_PTR(-ENOMEM);
goto out;
}
+ } else if (flags & FOLL_PIN) {
+ if (unlikely(!user_page_ref_inc(page))) {
+ page = ERR_PTR(-ENOMEM);
+ goto out;
+ }
}
if (flags & FOLL_TOUCH) {
if ((flags & FOLL_WRITE) &&
@@ -537,8 +632,8 @@ static struct page *follow_page_mask(struct vm_area_struct *vma,
/* make this handle hugepd */
page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
if (!IS_ERR(page)) {
- BUG_ON(flags & FOLL_GET);
- return page;
+ WARN_ON_ONCE(flags & (FOLL_GET | FOLL_PIN));
+ return NULL;
}

pgd = pgd_offset(mm, address);
@@ -1830,13 +1925,17 @@ static inline pte_t gup_get_pte(pte_t *ptep)
#endif /* CONFIG_GUP_GET_PTE_LOW_HIGH */

static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
+ unsigned int flags,
struct page **pages)
{
while ((*nr) - nr_start) {
struct page *page = pages[--(*nr)];

ClearPageReferenced(page);
- put_page(page);
+ if (flags & FOLL_PIN)
+ put_user_page(page);
+ else
+ put_page(page);
}
}

@@ -1869,7 +1968,7 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,

pgmap = get_dev_pagemap(pte_pfn(pte), pgmap);
if (unlikely(!pgmap)) {
- undo_dev_pagemap(nr, nr_start, pages);
+ undo_dev_pagemap(nr, nr_start, flags, pages);
goto pte_unmap;
}
} else if (pte_special(pte))
@@ -1878,9 +1977,15 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);

- head = try_get_compound_head(page, 1);
- if (!head)
- goto pte_unmap;
+ if (flags & FOLL_PIN) {
+ head = page;
+ if (unlikely(!user_page_ref_inc(head)))
+ goto pte_unmap;
+ } else {
+ head = try_get_compound_head(page, 1);
+ if (!head)
+ goto pte_unmap;
+ }

if (unlikely(pte_val(pte) != pte_val(*ptep))) {
put_page(head);
@@ -1934,12 +2039,20 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,

pgmap = get_dev_pagemap(pfn, pgmap);
if (unlikely(!pgmap)) {
- undo_dev_pagemap(nr, nr_start, pages);
+ undo_dev_pagemap(nr, nr_start, flags, pages);
return 0;
}
SetPageReferenced(page);
pages[*nr] = page;
- get_page(page);
+
+ if (flags & FOLL_PIN) {
+ if (unlikely(!user_page_ref_inc(page))) {
+ undo_dev_pagemap(nr, nr_start, flags, pages);
+ return 0;
+ }
+ } else
+ get_page(page);
+
(*nr)++;
pfn++;
} while (addr += PAGE_SIZE, addr != end);
@@ -1961,7 +2074,7 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
return 0;

if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
- undo_dev_pagemap(nr, nr_start, pages);
+ undo_dev_pagemap(nr, nr_start, flags, pages);
return 0;
}
return 1;
@@ -1979,7 +2092,7 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
return 0;

if (unlikely(pud_val(orig) != pud_val(*pudp))) {
- undo_dev_pagemap(nr, nr_start, pages);
+ undo_dev_pagemap(nr, nr_start, flags, pages);
return 0;
}
return 1;
@@ -2063,9 +2176,16 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
refs = __record_subpages(page, addr, end, pages, *nr);

- head = try_get_compound_head(head, refs);
- if (!head)
- return 0;
+ if (flags & FOLL_PIN) {
+ head = page;
+ if (unlikely(!user_page_ref_inc(head)))
+ return 0;
+ head = page;
+ } else {
+ head = try_get_compound_head(head, refs);
+ if (!head)
+ return 0;
+ }

if (unlikely(pte_val(pte) != pte_val(*ptep))) {
put_compound_head(head, refs);
@@ -2122,9 +2242,15 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
refs = __record_subpages(page, addr, end, pages, *nr);

- head = try_get_compound_head(pmd_page(orig), refs);
- if (!head)
- return 0;
+ if (flags & FOLL_PIN) {
+ head = page;
+ if (unlikely(!user_page_ref_inc(head)))
+ return 0;
+ } else {
+ head = try_get_compound_head(pmd_page(orig), refs);
+ if (!head)
+ return 0;
+ }

if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
put_compound_head(head, refs);
@@ -2155,9 +2281,15 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = __record_subpages(page, addr, end, pages, *nr);

- head = try_get_compound_head(pud_page(orig), refs);
- if (!head)
- return 0;
+ if (flags & FOLL_PIN) {
+ head = page;
+ if (unlikely(!user_page_ref_inc(head)))
+ return 0;
+ } else {
+ head = try_get_compound_head(pud_page(orig), refs);
+ if (!head)
+ return 0;
+ }

if (unlikely(pud_val(orig) != pud_val(*pudp))) {
put_compound_head(head, refs);
@@ -2183,9 +2315,15 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
refs = __record_subpages(page, addr, end, pages, *nr);

- head = try_get_compound_head(pgd_page(orig), refs);
- if (!head)
- return 0;
+ if (flags & FOLL_PIN) {
+ head = page;
+ if (unlikely(!user_page_ref_inc(head)))
+ return 0;
+ } else {
+ head = try_get_compound_head(pgd_page(orig), refs);
+ if (!head)
+ return 0;
+ }

if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
put_compound_head(head, refs);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 13cc93785006..4010c269e9e5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -945,6 +945,11 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
*/
WARN_ONCE(flags & FOLL_COW, "mm: In follow_devmap_pmd with FOLL_COW set");

+ /* FOLL_GET and FOLL_PIN are mutually exclusive. */
+ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
+ (FOLL_PIN | FOLL_GET)))
+ return NULL;
+
if (flags & FOLL_WRITE && !pmd_write(*pmd))
return NULL;

@@ -960,7 +965,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
* device mapped pages can only be returned if the
* caller will manage the page reference count.
*/
- if (!(flags & FOLL_GET))
+ if (!(flags & (FOLL_GET | FOLL_PIN)))
return ERR_PTR(-EEXIST);

pfn += (addr & ~PMD_MASK) >> PAGE_SHIFT;
@@ -968,7 +973,18 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!*pgmap)
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- get_page(page);
+
+ if (flags & FOLL_GET)
+ get_page(page);
+ else if (flags & FOLL_PIN) {
+ /*
+ * user_page_ref_inc() is not actually expected to fail here
+ * because we hold the pmd lock so no one can unmap the pmd and
+ * free the page that it points to.
+ */
+ if (unlikely(!user_page_ref_inc(page)))
+ page = ERR_PTR(-EFAULT);
+ }

return page;
}
@@ -1088,6 +1104,11 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
if (flags & FOLL_WRITE && !pud_write(*pud))
return NULL;

+ /* FOLL_GET and FOLL_PIN are mutually exclusive. */
+ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
+ (FOLL_PIN | FOLL_GET)))
+ return NULL;
+
if (pud_present(*pud) && pud_devmap(*pud))
/* pass */;
else
@@ -1099,8 +1120,10 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
/*
* device mapped pages can only be returned if the
* caller will manage the page reference count.
+ *
+ * At least one of FOLL_GET | FOLL_PIN must be set, so assert that here:
*/
- if (!(flags & FOLL_GET))
+ if (!(flags & (FOLL_GET | FOLL_PIN)))
return ERR_PTR(-EEXIST);

pfn += (addr & ~PUD_MASK) >> PAGE_SHIFT;
@@ -1108,7 +1131,18 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
if (!*pgmap)
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- get_page(page);
+
+ if (flags & FOLL_GET)
+ get_page(page);
+ else if (flags & FOLL_PIN) {
+ /*
+ * user_page_ref_inc() is not actually expected to fail here
+ * because we hold the pud lock so no one can unmap the pud and
+ * free the page that it points to.
+ */
+ if (unlikely(!user_page_ref_inc(page)))
+ page = ERR_PTR(-EFAULT);
+ }

return page;
}
@@ -1522,8 +1556,20 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
skip_mlock:
page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT;
VM_BUG_ON_PAGE(!PageCompound(page) && !is_zone_device_page(page), page);
+
if (flags & FOLL_GET)
get_page(page);
+ else if (flags & FOLL_PIN) {
+ /*
+ * user_page_ref_inc() is not actually expected to fail here
+ * because we hold the pmd lock so no one can unmap the pmd and
+ * free the page that it points to.
+ */
+ if (unlikely(!user_page_ref_inc(page))) {
+ WARN_ON_ONCE(1);
+ page = NULL;
+ }
+ }

out:
return page;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b45a95363a84..5ee80eea25e5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4462,7 +4462,22 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
same_page:
if (pages) {
pages[i] = mem_map_offset(page, pfn_offset);
- get_page(pages[i]);
+
+ if (flags & FOLL_GET)
+ get_page(pages[i]);
+ else if (flags & FOLL_PIN) {
+ /*
+ * user_page_ref_inc() is not actually expected
+ * to fail here because we hold the ptl.
+ */
+ if (unlikely(!user_page_ref_inc(pages[i]))) {
+ spin_unlock(ptl);
+ remainder = 0;
+ err = -ENOMEM;
+ WARN_ON_ONCE(1);
+ break;
+ }
+ }
}

if (vmas)
@@ -5022,6 +5037,12 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
struct page *page = NULL;
spinlock_t *ptl;
pte_t pte;
+
+ /* FOLL_GET and FOLL_PIN are mutually exclusive. */
+ if (WARN_ON_ONCE((flags & (FOLL_PIN | FOLL_GET)) ==
+ (FOLL_PIN | FOLL_GET)))
+ return NULL;
+
retry:
ptl = pmd_lockptr(mm, pmd);
spin_lock(ptl);
@@ -5034,8 +5055,20 @@ follow_huge_pmd(struct mm_struct *mm, unsigned long address,
pte = huge_ptep_get((pte_t *)pmd);
if (pte_present(pte)) {
page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
+
if (flags & FOLL_GET)
get_page(page);
+ else if (flags & FOLL_PIN) {
+ /*
+ * user_page_ref_inc() is not actually expected to fail
+ * here because we hold the ptl.
+ */
+ if (unlikely(!user_page_ref_inc(page))) {
+ WARN_ON_ONCE(1);
+ page = NULL;
+ goto out;
+ }
+ }
} else {
if (is_hugetlb_entry_migration(pte)) {
spin_unlock(ptl);
@@ -5056,7 +5089,7 @@ struct page * __weak
follow_huge_pud(struct mm_struct *mm, unsigned long address,
pud_t *pud, int flags)
{
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return NULL;

return pte_page(*(pte_t *)pud) + ((address & ~PUD_MASK) >> PAGE_SHIFT);
@@ -5065,7 +5098,7 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
struct page * __weak
follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags)
{
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return NULL;

return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> PAGE_SHIFT);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index a8222041bd44..fdad40ccde7b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1167,6 +1167,8 @@ const char * const vmstat_text[] = {
"nr_dirtied",
"nr_written",
"nr_kernel_misc_reclaimable",
+ "nr_foll_pin_requested",
+ "nr_foll_pin_returned",

/* enum writeback_stat_item counters */
"nr_dirty_threshold",
--
2.24.0

2019-11-12 00:13:07

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 04/23] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages

An upcoming patch changes and complicates the refcounting and
especially the "put page" aspects of it. In order to keep
everything clean, refactor the devmap page release routines:

* Rename put_devmap_managed_page() to page_is_devmap_managed(),
and limit the functionality to "read only": return a bool,
with no side effects.

* Add a new routine, put_devmap_managed_page(), to handle checking
what kind of page it is, and what kind of refcount handling it
requires.

* Rename __put_devmap_managed_page() to free_devmap_managed_page(),
and limit the functionality to unconditionally freeing a devmap
page.

This is originally based on a separate patch by Ira Weiny, which
applied to an early version of the put_user_page() experiments.
Since then, Jérôme Glisse suggested the refactoring described above.

Suggested-by: Jérôme Glisse <[email protected]>
Signed-off-by: Ira Weiny <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
include/linux/mm.h | 27 ++++++++++++++++---
mm/memremap.c | 67 ++++++++++++++++++++--------------------------
2 files changed, 53 insertions(+), 41 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a2adf95b3f9c..96228376139c 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -967,9 +967,10 @@ static inline bool is_zone_device_page(const struct page *page)
#endif

#ifdef CONFIG_DEV_PAGEMAP_OPS
-void __put_devmap_managed_page(struct page *page);
+void free_devmap_managed_page(struct page *page);
DECLARE_STATIC_KEY_FALSE(devmap_managed_key);
-static inline bool put_devmap_managed_page(struct page *page)
+
+static inline bool page_is_devmap_managed(struct page *page)
{
if (!static_branch_unlikely(&devmap_managed_key))
return false;
@@ -978,7 +979,6 @@ static inline bool put_devmap_managed_page(struct page *page)
switch (page->pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
case MEMORY_DEVICE_FS_DAX:
- __put_devmap_managed_page(page);
return true;
default:
break;
@@ -986,6 +986,27 @@ static inline bool put_devmap_managed_page(struct page *page)
return false;
}

+static inline bool put_devmap_managed_page(struct page *page)
+{
+ bool is_devmap = page_is_devmap_managed(page);
+
+ if (is_devmap) {
+ int count = page_ref_dec_return(page);
+
+ /*
+ * devmap page refcounts are 1-based, rather than 0-based: if
+ * refcount is 1, then the page is free and the refcount is
+ * stable because nobody holds a reference on the page.
+ */
+ if (count == 1)
+ free_devmap_managed_page(page);
+ else if (!count)
+ __put_page(page);
+ }
+
+ return is_devmap;
+}
+
#else /* CONFIG_DEV_PAGEMAP_OPS */
static inline bool put_devmap_managed_page(struct page *page)
{
diff --git a/mm/memremap.c b/mm/memremap.c
index 03ccbdfeb697..bc7e2a27d025 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -410,48 +410,39 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
EXPORT_SYMBOL_GPL(get_dev_pagemap);

#ifdef CONFIG_DEV_PAGEMAP_OPS
-void __put_devmap_managed_page(struct page *page)
+void free_devmap_managed_page(struct page *page)
{
- int count = page_ref_dec_return(page);
+ /* Clear Active bit in case of parallel mark_page_accessed */
+ __ClearPageActive(page);
+ __ClearPageWaiters(page);
+
+ mem_cgroup_uncharge(page);

/*
- * If refcount is 1 then page is freed and refcount is stable as nobody
- * holds a reference on the page.
+ * When a device_private page is freed, the page->mapping field
+ * may still contain a (stale) mapping value. For example, the
+ * lower bits of page->mapping may still identify the page as
+ * an anonymous page. Ultimately, this entire field is just
+ * stale and wrong, and it will cause errors if not cleared.
+ * One example is:
+ *
+ * migrate_vma_pages()
+ * migrate_vma_insert_page()
+ * page_add_new_anon_rmap()
+ * __page_set_anon_rmap()
+ * ...checks page->mapping, via PageAnon(page) call,
+ * and incorrectly concludes that the page is an
+ * anonymous page. Therefore, it incorrectly,
+ * silently fails to set up the new anon rmap.
+ *
+ * For other types of ZONE_DEVICE pages, migration is either
+ * handled differently or not done at all, so there is no need
+ * to clear page->mapping.
*/
- if (count == 1) {
- /* Clear Active bit in case of parallel mark_page_accessed */
- __ClearPageActive(page);
- __ClearPageWaiters(page);
-
- mem_cgroup_uncharge(page);
-
- /*
- * When a device_private page is freed, the page->mapping field
- * may still contain a (stale) mapping value. For example, the
- * lower bits of page->mapping may still identify the page as
- * an anonymous page. Ultimately, this entire field is just
- * stale and wrong, and it will cause errors if not cleared.
- * One example is:
- *
- * migrate_vma_pages()
- * migrate_vma_insert_page()
- * page_add_new_anon_rmap()
- * __page_set_anon_rmap()
- * ...checks page->mapping, via PageAnon(page) call,
- * and incorrectly concludes that the page is an
- * anonymous page. Therefore, it incorrectly,
- * silently fails to set up the new anon rmap.
- *
- * For other types of ZONE_DEVICE pages, migration is either
- * handled differently or not done at all, so there is no need
- * to clear page->mapping.
- */
- if (is_device_private_page(page))
- page->mapping = NULL;
+ if (is_device_private_page(page))
+ page->mapping = NULL;

- page->pgmap->ops->page_free(page);
- } else if (!count)
- __put_page(page);
+ page->pgmap->ops->page_free(page);
}
-EXPORT_SYMBOL(__put_devmap_managed_page);
+EXPORT_SYMBOL(free_devmap_managed_page);
#endif /* CONFIG_DEV_PAGEMAP_OPS */
--
2.24.0

2019-11-12 00:13:52

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 10/23] goldish_pipe: convert to pin_user_pages() and put_user_page()

1. Call the new global pin_user_pages_fast(), from pin_goldfish_pages().

2. As required by pin_user_pages(), release these pages via
put_user_page(). In this case, do so via put_user_pages_dirty_lock().

That has the side effect of calling set_page_dirty_lock(), instead
of set_page_dirty(). This is probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

Another side effect is that the release code is simplified because
the page[] loop is now in gup.c instead of here, so just delete the
local release_user_pages() entirely, and call
put_user_pages_dirty_lock() directly, instead.

[1] https://lore.kernel.org/r/[email protected]

Reviewed-by: Ira Weiny <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
drivers/platform/goldfish/goldfish_pipe.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/platform/goldfish/goldfish_pipe.c b/drivers/platform/goldfish/goldfish_pipe.c
index 7ed2a21a0bac..635a8bc1b480 100644
--- a/drivers/platform/goldfish/goldfish_pipe.c
+++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -274,7 +274,7 @@ static int pin_goldfish_pages(unsigned long first_page,
*iter_last_page_size = last_page_size;
}

- ret = get_user_pages_fast(first_page, requested_pages,
+ ret = pin_user_pages_fast(first_page, requested_pages,
!is_write ? FOLL_WRITE : 0,
pages);
if (ret <= 0)
@@ -285,18 +285,6 @@ static int pin_goldfish_pages(unsigned long first_page,
return ret;
}

-static void release_user_pages(struct page **pages, int pages_count,
- int is_write, s32 consumed_size)
-{
- int i;
-
- for (i = 0; i < pages_count; i++) {
- if (!is_write && consumed_size > 0)
- set_page_dirty(pages[i]);
- put_page(pages[i]);
- }
-}
-
/* Populate the call parameters, merging adjacent pages together */
static void populate_rw_params(struct page **pages,
int pages_count,
@@ -372,7 +360,8 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe,

*consumed_size = pipe->command_buffer->rw_params.consumed_size;

- release_user_pages(pipe->pages, pages_count, is_write, *consumed_size);
+ put_user_pages_dirty_lock(pipe->pages, pages_count,
+ !is_write && *consumed_size > 0);

mutex_unlock(&pipe->lock);
return 0;
--
2.24.0

2019-11-12 00:14:11

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 01/23] mm/gup: pass flags arg to __gup_device_* functions

A subsequent patch requires access to gup flags, so
pass the flags argument through to the __gup_device_*
functions.

Also placate checkpatch.pl by shortening a nearby line.

Reviewed-by: Jérôme Glisse <[email protected]>
Reviewed-by: Ira Weiny <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
mm/gup.c | 28 ++++++++++++++++++----------
1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 8f236a335ae9..85caf76b3012 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1890,7 +1890,8 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,

#if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE)
static int __gup_device_huge(unsigned long pfn, unsigned long addr,
- unsigned long end, struct page **pages, int *nr)
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
int nr_start = *nr;
struct dev_pagemap *pgmap = NULL;
@@ -1916,13 +1917,14 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
}

static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
- unsigned long end, struct page **pages, int *nr)
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
unsigned long fault_pfn;
int nr_start = *nr;

fault_pfn = pmd_pfn(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
- if (!__gup_device_huge(fault_pfn, addr, end, pages, nr))
+ if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr))
return 0;

if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
@@ -1933,13 +1935,14 @@ static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
}

static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
- unsigned long end, struct page **pages, int *nr)
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
unsigned long fault_pfn;
int nr_start = *nr;

fault_pfn = pud_pfn(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
- if (!__gup_device_huge(fault_pfn, addr, end, pages, nr))
+ if (!__gup_device_huge(fault_pfn, addr, end, flags, pages, nr))
return 0;

if (unlikely(pud_val(orig) != pud_val(*pudp))) {
@@ -1950,14 +1953,16 @@ static int __gup_device_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
}
#else
static int __gup_device_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
- unsigned long end, struct page **pages, int *nr)
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
BUILD_BUG();
return 0;
}

static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
- unsigned long end, struct page **pages, int *nr)
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
BUILD_BUG();
return 0;
@@ -2062,7 +2067,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
if (pmd_devmap(orig)) {
if (unlikely(flags & FOLL_LONGTERM))
return 0;
- return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
+ return __gup_device_huge_pmd(orig, pmdp, addr, end, flags,
+ pages, nr);
}

refs = 0;
@@ -2092,7 +2098,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
}

static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
- unsigned long end, unsigned int flags, struct page **pages, int *nr)
+ unsigned long end, unsigned int flags,
+ struct page **pages, int *nr)
{
struct page *head, *page;
int refs;
@@ -2103,7 +2110,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
if (pud_devmap(orig)) {
if (unlikely(flags & FOLL_LONGTERM))
return 0;
- return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
+ return __gup_device_huge_pud(orig, pudp, addr, end, flags,
+ pages, nr);
}

refs = 0;
--
2.24.0

2019-11-12 00:14:46

by John Hubbard

[permalink] [raw]
Subject: [PATCH v3 03/23] mm/gup: move try_get_compound_head() to top, fix minor issues

An upcoming patch uses try_get_compound_head() more widely,
so move it to the top of gup.c.

Also fix a tiny spelling error and a checkpatch.pl warning.

Signed-off-by: John Hubbard <[email protected]>
---
mm/gup.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 199da99e8ffc..933524de6249 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,6 +29,21 @@ struct follow_page_context {
unsigned int page_mask;
};

+/*
+ * Return the compound head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+ struct page *head = compound_head(page);
+
+ if (WARN_ON_ONCE(page_ref_count(head) < 0))
+ return NULL;
+ if (unlikely(!page_cache_add_speculative(head, refs)))
+ return NULL;
+ return head;
+}
+
/**
* put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
* @pages: array of pages to be maybe marked dirty, and definitely released.
@@ -1793,20 +1808,6 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
}
}

-/*
- * Return the compund head page with ref appropriately incremented,
- * or NULL if that failed.
- */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
-{
- struct page *head = compound_head(page);
- if (WARN_ON_ONCE(page_ref_count(head) < 0))
- return NULL;
- if (unlikely(!page_cache_add_speculative(head, refs)))
- return NULL;
- return head;
-}
-
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
unsigned int flags, struct page **pages, int *nr)
--
2.24.0

2019-11-12 20:39:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On Mon, Nov 11, 2019 at 04:06:37PM -0800, John Hubbard wrote:
> Hi,
>
> The cover letter is long, so the more important stuff is first:
>
> * Jason, if you or someone could look at the the VFIO cleanup (patch 8)
> and conversion to FOLL_PIN (patch 18), to make sure it's use of
> remote and longterm gup matches what we discussed during the review
> of v2, I'd appreciate it.
>
> * Also for Jason and IB: as noted below, in patch 11, I am (too?) boldly
> converting from put_user_pages() to release_pages().

Why are we doing this? I think things got confused here someplace, as
the comment still says:

/**
* put_user_page() - release a gup-pinned page
* @page: pointer to page to be released
*
* Pages that were pinned via get_user_pages*() must be released via
* either put_user_page(), or one of the put_user_pages*() routines
* below.

I feel like if put_user_pages() is not the correct way to undo
get_user_pages() then it needs to be deleted.

Jason

2019-11-12 21:13:02

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On 11/12/19 12:38 PM, Jason Gunthorpe wrote:
> On Mon, Nov 11, 2019 at 04:06:37PM -0800, John Hubbard wrote:
>> Hi,
>>
>> The cover letter is long, so the more important stuff is first:
>>
>> * Jason, if you or someone could look at the the VFIO cleanup (patch 8)
>> and conversion to FOLL_PIN (patch 18), to make sure it's use of
>> remote and longterm gup matches what we discussed during the review
>> of v2, I'd appreciate it.
>>
>> * Also for Jason and IB: as noted below, in patch 11, I am (too?) boldly
>> converting from put_user_pages() to release_pages().
>
> Why are we doing this? I think things got confused here someplace, as


Because:

a) These need put_page() calls, and

b) there is no put_pages() call, but there is a release_pages() call that
is, arguably, what put_pages() would be.


> the comment still says:
>
> /**
> * put_user_page() - release a gup-pinned page
> * @page: pointer to page to be released
> *
> * Pages that were pinned via get_user_pages*() must be released via
> * either put_user_page(), or one of the put_user_pages*() routines
> * below.


Ohhh, I missed those comments. They need to all be changed over to
say "pages that were pinned via pin_user_pages*() or
pin_longterm_pages*() must be released via put_user_page*()."

The get_user_pages*() pages must still be released via put_page.

The churn is due to a fairly significant change in strategy, whis
is: instead of changing all get_user_pages*() sites to call
put_user_page(), change selected sites to call pin_user_pages*() or
pin_longterm_pages*(), plus put_user_page().

That allows incrementally converting the kernel over to using the
new pin APIs, without taking on the huge risk of a big one-shot
conversion.

So, I've ended up with one place that actually needs to get reverted
back to get_user_pages(), and that's the IB ODP code.

>
> I feel like if put_user_pages() is not the correct way to undo
> get_user_pages() then it needs to be deleted.
>

Yes, you're right. I'll fix the put_user_page comments() as described.


thanks,

John Hubbard
NVIDIA

2019-11-13 08:24:19

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On Tue, Nov 12, 2019 at 10:10 PM John Hubbard <[email protected]> wrote:
>
> On 11/12/19 12:38 PM, Jason Gunthorpe wrote:
> > On Mon, Nov 11, 2019 at 04:06:37PM -0800, John Hubbard wrote:
> >> Hi,
> >>
> >> The cover letter is long, so the more important stuff is first:
> >>
> >> * Jason, if you or someone could look at the the VFIO cleanup (patch 8)
> >> and conversion to FOLL_PIN (patch 18), to make sure it's use of
> >> remote and longterm gup matches what we discussed during the review
> >> of v2, I'd appreciate it.
> >>
> >> * Also for Jason and IB: as noted below, in patch 11, I am (too?) boldly
> >> converting from put_user_pages() to release_pages().
> >
> > Why are we doing this? I think things got confused here someplace, as
>
>
> Because:
>
> a) These need put_page() calls, and
>
> b) there is no put_pages() call, but there is a release_pages() call that
> is, arguably, what put_pages() would be.
>
>
> > the comment still says:
> >
> > /**
> > * put_user_page() - release a gup-pinned page
> > * @page: pointer to page to be released
> > *
> > * Pages that were pinned via get_user_pages*() must be released via
> > * either put_user_page(), or one of the put_user_pages*() routines
> > * below.
>
>
> Ohhh, I missed those comments. They need to all be changed over to
> say "pages that were pinned via pin_user_pages*() or
> pin_longterm_pages*() must be released via put_user_page*()."
>
> The get_user_pages*() pages must still be released via put_page.
>
> The churn is due to a fairly significant change in strategy, whis
> is: instead of changing all get_user_pages*() sites to call
> put_user_page(), change selected sites to call pin_user_pages*() or
> pin_longterm_pages*(), plus put_user_page().

Can't we call this unpin_user_page then, for some symmetry? Or is that
even more churn?

Looking from afar the naming here seems really confusing.
-Daniel

> That allows incrementally converting the kernel over to using the
> new pin APIs, without taking on the huge risk of a big one-shot
> conversion.
>
> So, I've ended up with one place that actually needs to get reverted
> back to get_user_pages(), and that's the IB ODP code.
>
> >
> > I feel like if put_user_pages() is not the correct way to undo
> > get_user_pages() then it needs to be deleted.
> >
>
> Yes, you're right. I'll fix the put_user_page comments() as described.
>
>
> thanks,
>
> John Hubbard
> NVIDIA



--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

2019-11-13 09:05:56

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On 11/13/19 12:22 AM, Daniel Vetter wrote:
...
>>> Why are we doing this? I think things got confused here someplace, as
>>
>>
>> Because:
>>
>> a) These need put_page() calls, and
>>
>> b) there is no put_pages() call, but there is a release_pages() call that
>> is, arguably, what put_pages() would be.
>>
>>
>>> the comment still says:
>>>
>>> /**
>>> * put_user_page() - release a gup-pinned page
>>> * @page: pointer to page to be released
>>> *
>>> * Pages that were pinned via get_user_pages*() must be released via
>>> * either put_user_page(), or one of the put_user_pages*() routines
>>> * below.
>>
>>
>> Ohhh, I missed those comments. They need to all be changed over to
>> say "pages that were pinned via pin_user_pages*() or
>> pin_longterm_pages*() must be released via put_user_page*()."
>>
>> The get_user_pages*() pages must still be released via put_page.
>>
>> The churn is due to a fairly significant change in strategy, whis
>> is: instead of changing all get_user_pages*() sites to call
>> put_user_page(), change selected sites to call pin_user_pages*() or
>> pin_longterm_pages*(), plus put_user_page().
>
> Can't we call this unpin_user_page then, for some symmetry? Or is that
> even more churn?
>
> Looking from afar the naming here seems really confusing.


That look from afar is valuable, because I'm too close to the problem to see
how the naming looks. :)

unpin_user_page() sounds symmetrical. It's true that it would cause more
churn (which is why I started off with a proposal that avoids changing the
names of put_user_page*() APIs). But OTOH, the amount of churn is proportional
to the change in direction here, and it's really only 10 or 20 lines changed,
in the end.

So I'm open to changing to that naming. It would be nice to hear what others
prefer, too...


thanks,
--
John Hubbard
NVIDIA

2019-11-13 10:13:22

by Jan Kara

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On Wed 13-11-19 01:02:02, John Hubbard wrote:
> On 11/13/19 12:22 AM, Daniel Vetter wrote:
> ...
> > > > Why are we doing this? I think things got confused here someplace, as
> > >
> > >
> > > Because:
> > >
> > > a) These need put_page() calls, and
> > >
> > > b) there is no put_pages() call, but there is a release_pages() call that
> > > is, arguably, what put_pages() would be.
> > >
> > >
> > > > the comment still says:
> > > >
> > > > /**
> > > > * put_user_page() - release a gup-pinned page
> > > > * @page: pointer to page to be released
> > > > *
> > > > * Pages that were pinned via get_user_pages*() must be released via
> > > > * either put_user_page(), or one of the put_user_pages*() routines
> > > > * below.
> > >
> > >
> > > Ohhh, I missed those comments. They need to all be changed over to
> > > say "pages that were pinned via pin_user_pages*() or
> > > pin_longterm_pages*() must be released via put_user_page*()."
> > >
> > > The get_user_pages*() pages must still be released via put_page.
> > >
> > > The churn is due to a fairly significant change in strategy, whis
> > > is: instead of changing all get_user_pages*() sites to call
> > > put_user_page(), change selected sites to call pin_user_pages*() or
> > > pin_longterm_pages*(), plus put_user_page().
> >
> > Can't we call this unpin_user_page then, for some symmetry? Or is that
> > even more churn?
> >
> > Looking from afar the naming here seems really confusing.
>
>
> That look from afar is valuable, because I'm too close to the problem to see
> how the naming looks. :)
>
> unpin_user_page() sounds symmetrical. It's true that it would cause more
> churn (which is why I started off with a proposal that avoids changing the
> names of put_user_page*() APIs). But OTOH, the amount of churn is proportional
> to the change in direction here, and it's really only 10 or 20 lines changed,
> in the end.
>
> So I'm open to changing to that naming. It would be nice to hear what others
> prefer, too...

FWIW I'd find unpin_user_page() also better than put_user_page() as a
counterpart to pin_user_pages().

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2019-11-13 11:44:40

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On Wed, Nov 13, 2019 at 11:12:10AM +0100, Jan Kara wrote:
> On Wed 13-11-19 01:02:02, John Hubbard wrote:
> > On 11/13/19 12:22 AM, Daniel Vetter wrote:
> > ...
> > > > > Why are we doing this? I think things got confused here someplace, as
> > > >
> > > >
> > > > Because:
> > > >
> > > > a) These need put_page() calls, and
> > > >
> > > > b) there is no put_pages() call, but there is a release_pages() call that
> > > > is, arguably, what put_pages() would be.
> > > >
> > > >
> > > > > the comment still says:
> > > > >
> > > > > /**
> > > > > * put_user_page() - release a gup-pinned page
> > > > > * @page: pointer to page to be released
> > > > > *
> > > > > * Pages that were pinned via get_user_pages*() must be released via
> > > > > * either put_user_page(), or one of the put_user_pages*() routines
> > > > > * below.
> > > >
> > > >
> > > > Ohhh, I missed those comments. They need to all be changed over to
> > > > say "pages that were pinned via pin_user_pages*() or
> > > > pin_longterm_pages*() must be released via put_user_page*()."
> > > >
> > > > The get_user_pages*() pages must still be released via put_page.
> > > >
> > > > The churn is due to a fairly significant change in strategy, whis
> > > > is: instead of changing all get_user_pages*() sites to call
> > > > put_user_page(), change selected sites to call pin_user_pages*() or
> > > > pin_longterm_pages*(), plus put_user_page().
> > >
> > > Can't we call this unpin_user_page then, for some symmetry? Or is that
> > > even more churn?
> > >
> > > Looking from afar the naming here seems really confusing.
> >
> >
> > That look from afar is valuable, because I'm too close to the problem to see
> > how the naming looks. :)
> >
> > unpin_user_page() sounds symmetrical. It's true that it would cause more
> > churn (which is why I started off with a proposal that avoids changing the
> > names of put_user_page*() APIs). But OTOH, the amount of churn is proportional
> > to the change in direction here, and it's really only 10 or 20 lines changed,
> > in the end.
> >
> > So I'm open to changing to that naming. It would be nice to hear what others
> > prefer, too...
>
> FWIW I'd find unpin_user_page() also better than put_user_page() as a
> counterpart to pin_user_pages().

One more point from afar on pin/unpin: We use that a lot in graphics for
permanently pinned graphics buffer objects. Which really only should be
used for scanout. So at least graphics folks should have an appropriate
mindset and try to make sure we don't overuse this stuff.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2019-11-13 20:31:56

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH v3 00/23] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

On 11/13/19 3:43 AM, Daniel Vetter wrote:
...
>>>> Can't we call this unpin_user_page then, for some symmetry? Or is that
>>>> even more churn?
>>>>
>>>> Looking from afar the naming here seems really confusing.
>>>
>>>
>>> That look from afar is valuable, because I'm too close to the problem to see
>>> how the naming looks. :)
>>>
>>> unpin_user_page() sounds symmetrical. It's true that it would cause more
>>> churn (which is why I started off with a proposal that avoids changing the
>>> names of put_user_page*() APIs). But OTOH, the amount of churn is proportional
>>> to the change in direction here, and it's really only 10 or 20 lines changed,
>>> in the end.
>>>
>>> So I'm open to changing to that naming. It would be nice to hear what others
>>> prefer, too...
>>
>> FWIW I'd find unpin_user_page() also better than put_user_page() as a
>> counterpart to pin_user_pages().
>
> One more point from afar on pin/unpin: We use that a lot in graphics for
> permanently pinned graphics buffer objects. Which really only should be
> used for scanout. So at least graphics folks should have an appropriate
> mindset and try to make sure we don't overuse this stuff.
> -Daniel
>

OK, Ira also likes "unpin", and so far no one has said *anything* in favor
of the "put_user_page" names, so I think we have a winner! I'll change the
names to unpin_user_page*().


thanks,
--
John Hubbard
NVIDIA