2019-10-30 22:55:00

by John Hubbard

[permalink] [raw]
Subject: [PATCH 00/19] mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM

Hi,

This applies cleanly to linux-next and mmotm, and also to linux.git if
linux-next's commit 20cac10710c9 ("mm/gup_benchmark: fix MAP_HUGETLB
case") is first applied there.

This provides tracking of dma-pinned pages. This is a prerequisite to
solving the larger problem of proper interactions between file-backed
pages, and [R]DMA activities, as discussed in [1], [2], [3], and in
a remarkable number of email threads since about 2017. :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:

get_user_pages() (sets FOLL_GET)
put_page()

to this:
pin_user_pages() (sets FOLL_PIN)
put_user_page()

Because there are interdependencies with FOLL_LONGTERM, a similar
conversion as for FOLL_PIN, was applied. The change was from this:

get_user_pages(FOLL_LONGTERM) (also sets FOLL_GET)
put_page()

to this:
pin_longterm_pages() (sets FOLL_PIN | FOLL_LONGTERM)
put_user_page()

============================================================
Patch summary:

* Patches 1-4: refactoring and preparatory cleanup, independent fixes
(Patch 4: V4L2-core bug fix (can be separately applied))

* Patch 5: introduce pin_user_pages(), FOLL_PIN, but no functional
changes yet
* Patches 6-11: Convert existing put_user_page() callers, to use the
new pin*()
* Patch 12: Activate tracking of FOLL_PIN pages.
* Patches 13-15: convert FOLL_LONGTERM callers
* Patches: 16-17: gup_benchmark and run_vmtests support
* Patch 18: enforce FOLL_LONGTERM as a gup-internal (only) flag
* Patch 19: Documentation/vm/pin_user_pages.rst

============================================================
Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
and some directed testing to exercise some of the changes. And as you
can see, gup_benchmark is enhanced to exercise this. Basically, I've been
able to runtime test the core get_user_pages() and pin_user_pages() and
related routines, but not so much on several of the call sites--but those
are generally just a couple of lines changed, each.

Not much of the kernel is actually using this, which on one hand
reduces risk quite a lot. But on the other hand, testing coverage
is low. So I'd love it if, in particular, the Infiniband and PowerPC
folks could do a smoke test of this series for me.

Also, my runtime testing for the call sites so far is very weak:

* io_uring: Some directed tests from liburing exercise this, and they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and passes.
* infiniband (still only have crude "IB pingpong" working, on a
good day: it's not exercising my conversions at runtime...)
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...

============================================================
Next:

* Get the block/bio_vec sites converted to use pin_user_pages().

* Work with Ira and Dave Chinner to weave this together with the
layout lease stuff.

============================================================

[1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/
[3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/

John Hubbard (19):
mm/gup: pass flags arg to __gup_device_* functions
mm/gup: factor out duplicate code from four routines
goldish_pipe: rename local pin_user_pages() routine
media/v4l2-core: set pages dirty upon releasing DMA buffers
mm/gup: introduce pin_user_pages*() and FOLL_PIN
goldish_pipe: convert to pin_user_pages() and put_user_page()
infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*()
mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()
drm/via: set FOLL_PIN via pin_user_pages_fast()
fs/io_uring: set FOLL_PIN via pin_user_pages()
net/xdp: set FOLL_PIN via pin_user_pages()
mm/gup: track FOLL_PIN pages
media/v4l2-core: pin_longterm_pages (FOLL_PIN) and put_user_page()
conversion
vfio, mm: pin_longterm_pages (FOLL_PIN) and put_user_page() conversion
powerpc: book3s64: convert to pin_longterm_pages() and put_user_page()
mm/gup_benchmark: support pin_user_pages() and related calls
selftests/vm: run_vmtests: invoke gup_benchmark with basic FOLL_PIN
coverage
mm/gup: remove support for gup(FOLL_LONGTERM)
Documentation/vm: add pin_user_pages.rst

Documentation/vm/index.rst | 1 +
Documentation/vm/pin_user_pages.rst | 213 +++++++
arch/powerpc/mm/book3s64/iommu_api.c | 15 +-
drivers/gpu/drm/via/via_dmablit.c | 2 +-
drivers/infiniband/core/umem.c | 5 +-
drivers/infiniband/core/umem_odp.c | 10 +-
drivers/infiniband/hw/hfi1/user_pages.c | 4 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 3 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 8 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +-
drivers/infiniband/sw/siw/siw_mem.c | 5 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 10 +-
drivers/platform/goldfish/goldfish_pipe.c | 35 +-
drivers/vfio/vfio_iommu_type1.c | 15 +-
fs/io_uring.c | 5 +-
include/linux/mm.h | 133 ++++-
include/linux/mmzone.h | 2 +
include/linux/page_ref.h | 10 +
mm/gup.c | 622 ++++++++++++++++----
mm/gup_benchmark.c | 81 ++-
mm/huge_memory.c | 32 +-
mm/hugetlb.c | 28 +-
mm/memremap.c | 4 +-
mm/process_vm_access.c | 28 +-
mm/vmstat.c | 2 +
net/xdp/xdp_umem.c | 4 +-
tools/testing/selftests/vm/gup_benchmark.c | 28 +-
tools/testing/selftests/vm/run_vmtests | 22 +
29 files changed, 1066 insertions(+), 272 deletions(-)
create mode 100644 Documentation/vm/pin_user_pages.rst

--
2.23.0


2019-10-30 22:55:34

by John Hubbard

[permalink] [raw]
Subject: [PATCH 02/19] mm/gup: factor out duplicate code from four routines

There are four locations in gup.c that have a fair amount of code
duplication. This means that changing one requires making the same
changes in four places, not to mention reading the same code four
times, and wondering if there are subtle differences.

Factor out the common code into static functions, thus reducing the
overall line count and the code's complexity.

Also, take the opportunity to slightly improve the efficiency of the
error cases, by doing a mass subtraction of the refcount, surrounded
by get_page()/put_page().

Also, further simplify (slightly), by waiting until the the successful
end of each routine, to increment *nr.

Cc: Christoph Hellwig <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
mm/gup.c | 113 ++++++++++++++++++++++---------------------------------
1 file changed, 46 insertions(+), 67 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 85caf76b3012..8fb0d9cdfaf5 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1969,6 +1969,35 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
}
#endif

+static int __record_subpages(struct page *page, unsigned long addr,
+ unsigned long end, struct page **pages, int nr)
+{
+ int nr_recorded_pages = 0;
+
+ do {
+ pages[nr] = page;
+ nr++;
+ page++;
+ nr_recorded_pages++;
+ } while (addr += PAGE_SIZE, addr != end);
+ return nr_recorded_pages;
+}
+
+static void __remove_refs_from_head(struct page *page, int refs)
+{
+ /* Do a get_page() first, in case refs == page->_refcount */
+ get_page(page);
+ page_ref_sub(page, refs);
+ put_page(page);
+}
+
+static int __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr)
+{
+ *nr += nr_recorded_pages;
+ SetPageReferenced(head);
+ return 1;
+}
+
#ifdef CONFIG_ARCH_HAS_HUGEPD
static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
unsigned long sz)
@@ -1998,34 +2027,19 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
/* hugepages are never "special" */
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));

- refs = 0;
head = pte_page(pte);
-
page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
- do {
- VM_BUG_ON(compound_head(page) != head);
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(head, refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pte_val(pte) != pte_val(*ptep))) {
- /* Could be optimized better */
- *nr -= refs;
- while (refs--)
- put_page(head);
+ __remove_refs_from_head(head, refs);
return 0;
}
-
- SetPageReferenced(head);
- return 1;
+ return __huge_pt_done(head, refs, nr);
}

static int gup_huge_pd(hugepd_t hugepd, unsigned long addr,
@@ -2071,30 +2085,18 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
pages, nr);
}

- refs = 0;
page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
- do {
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(pmd_page(orig), refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
- *nr -= refs;
- while (refs--)
- put_page(head);
+ __remove_refs_from_head(head, refs);
return 0;
}
-
- SetPageReferenced(head);
- return 1;
+ return __huge_pt_done(head, refs, nr);
}

static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
@@ -2114,30 +2116,18 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
pages, nr);
}

- refs = 0;
page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
- do {
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(pud_page(orig), refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pud_val(orig) != pud_val(*pudp))) {
- *nr -= refs;
- while (refs--)
- put_page(head);
+ __remove_refs_from_head(head, refs);
return 0;
}
-
- SetPageReferenced(head);
- return 1;
+ return __huge_pt_done(head, refs, nr);
}

static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
@@ -2151,30 +2141,19 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
return 0;

BUILD_BUG_ON(pgd_devmap(orig));
- refs = 0;
+
page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
- do {
- pages[*nr] = page;
- (*nr)++;
- page++;
- refs++;
- } while (addr += PAGE_SIZE, addr != end);
+ refs = __record_subpages(page, addr, end, pages, *nr);

head = try_get_compound_head(pgd_page(orig), refs);
- if (!head) {
- *nr -= refs;
+ if (!head)
return 0;
- }

if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
- *nr -= refs;
- while (refs--)
- put_page(head);
+ __remove_refs_from_head(head, refs);
return 0;
}
-
- SetPageReferenced(head);
- return 1;
+ return __huge_pt_done(head, refs, nr);
}

static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
--
2.23.0

2019-10-30 22:55:55

by John Hubbard

[permalink] [raw]
Subject: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast()

Convert drm/via to use the new pin_user_pages_fast() call, which sets
FOLL_PIN. Setting FOLL_PIN is now required for code that requires
tracking of pinned pages, and therefore for any code that calls
put_user_page().

Signed-off-by: John Hubbard <[email protected]>
---
drivers/gpu/drm/via/via_dmablit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
index 3db000aacd26..37c5e572993a 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg, drm_via_dmablit_t *xfer)
vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages));
if (NULL == vsg->pages)
return -ENOMEM;
- ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
+ ret = pin_user_pages_fast((unsigned long)xfer->mem_addr,
vsg->num_pages,
vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
vsg->pages);
--
2.23.0

2019-10-30 22:56:32

by John Hubbard

[permalink] [raw]
Subject: [PATCH 04/19] media/v4l2-core: set pages dirty upon releasing DMA buffers

After DMA is complete, and the device and CPU caches are synchronized,
it's still required to mark the CPU pages as dirty, if the data was
coming from the device. However, this driver was just issuing a
bare put_page() call, without any set_page_dirty*() call.

Fix the problem, by calling set_page_dirty_lock() if the CPU pages
were potentially receiving data from the device.

Cc: Mauro Carvalho Chehab <[email protected]>
Signed-off-by: John Hubbard <[email protected]>
---
drivers/media/v4l2-core/videobuf-dma-sg.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 66a6c6c236a7..28262190c3ab 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,11 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
BUG_ON(dma->sglen);

if (dma->pages) {
- for (i = 0; i < dma->nr_pages; i++)
+ for (i = 0; i < dma->nr_pages; i++) {
+ if (dma->direction == DMA_FROM_DEVICE)
+ set_page_dirty_lock(dma->pages[i]);
put_page(dma->pages[i]);
+ }
kfree(dma->pages);
dma->pages = NULL;
}
--
2.23.0

2019-10-31 00:58:49

by John Hubbard

[permalink] [raw]
Subject: [PATCH 11/19] net/xdp: set FOLL_PIN via pin_user_pages()

Convert net/xdp to use the new pin_longterm_pages() call, which sets
FOLL_PIN. Setting FOLL_PIN is now required for code that requires
tracking of pinned pages.

Signed-off-by: John Hubbard <[email protected]>
---
net/xdp/xdp_umem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 16d5f353163a..4d56dfb1139a 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -285,8 +285,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
return -ENOMEM;

down_read(&current->mm->mmap_sem);
- npgs = get_user_pages(umem->address, umem->npgs,
- gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
+ npgs = pin_longterm_pages(umem->address, umem->npgs, gup_flags,
+ &umem->pgs[0], NULL);
up_read(&current->mm->mmap_sem);

if (npgs != umem->npgs) {
--
2.23.0

2019-10-31 18:36:56

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 02/19] mm/gup: factor out duplicate code from four routines

On Wed, Oct 30, 2019 at 03:49:13PM -0700, John Hubbard wrote:
> There are four locations in gup.c that have a fair amount of code
> duplication. This means that changing one requires making the same
> changes in four places, not to mention reading the same code four
> times, and wondering if there are subtle differences.
>
> Factor out the common code into static functions, thus reducing the
> overall line count and the code's complexity.
>
> Also, take the opportunity to slightly improve the efficiency of the
> error cases, by doing a mass subtraction of the refcount, surrounded
> by get_page()/put_page().
>
> Also, further simplify (slightly), by waiting until the the successful
> end of each routine, to increment *nr.

Overall it seems like a pretty good clean up. It did take a bit of review but
I _think_ it is correct. A couple of comments below.

>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Aneesh Kumar K.V <[email protected]>
> Signed-off-by: John Hubbard <[email protected]>
> ---
> mm/gup.c | 113 ++++++++++++++++++++++---------------------------------
> 1 file changed, 46 insertions(+), 67 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 85caf76b3012..8fb0d9cdfaf5 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1969,6 +1969,35 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
> }
> #endif
>
> +static int __record_subpages(struct page *page, unsigned long addr,
> + unsigned long end, struct page **pages, int nr)
> +{
> + int nr_recorded_pages = 0;
> +
> + do {
> + pages[nr] = page;
> + nr++;
> + page++;
> + nr_recorded_pages++;
> + } while (addr += PAGE_SIZE, addr != end);
> + return nr_recorded_pages;
> +}
> +
> +static void __remove_refs_from_head(struct page *page, int refs)
> +{
> + /* Do a get_page() first, in case refs == page->_refcount */
> + get_page(page);
> + page_ref_sub(page, refs);
> + put_page(page);
> +}

I wonder if this is better implemented as "put_compound_head()"? To match the
try_get_compound_head() call below?

> +
> +static int __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr)
> +{
> + *nr += nr_recorded_pages;
> + SetPageReferenced(head);
> + return 1;

When will this return anything but 1?

Ira

> +}
> +
> #ifdef CONFIG_ARCH_HAS_HUGEPD
> static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
> unsigned long sz)
> @@ -1998,34 +2027,19 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
> /* hugepages are never "special" */
> VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
>
> - refs = 0;
> head = pte_page(pte);
> -
> page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
> - do {
> - VM_BUG_ON(compound_head(page) != head);
> - pages[*nr] = page;
> - (*nr)++;
> - page++;
> - refs++;
> - } while (addr += PAGE_SIZE, addr != end);
> + refs = __record_subpages(page, addr, end, pages, *nr);
>
> head = try_get_compound_head(head, refs);
> - if (!head) {
> - *nr -= refs;
> + if (!head)
> return 0;
> - }
>
> if (unlikely(pte_val(pte) != pte_val(*ptep))) {
> - /* Could be optimized better */
> - *nr -= refs;
> - while (refs--)
> - put_page(head);
> + __remove_refs_from_head(head, refs);
> return 0;
> }
> -
> - SetPageReferenced(head);
> - return 1;
> + return __huge_pt_done(head, refs, nr);
> }
>
> static int gup_huge_pd(hugepd_t hugepd, unsigned long addr,
> @@ -2071,30 +2085,18 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
> pages, nr);
> }
>
> - refs = 0;
> page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
> - do {
> - pages[*nr] = page;
> - (*nr)++;
> - page++;
> - refs++;
> - } while (addr += PAGE_SIZE, addr != end);
> + refs = __record_subpages(page, addr, end, pages, *nr);
>
> head = try_get_compound_head(pmd_page(orig), refs);
> - if (!head) {
> - *nr -= refs;
> + if (!head)
> return 0;
> - }
>
> if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
> - *nr -= refs;
> - while (refs--)
> - put_page(head);
> + __remove_refs_from_head(head, refs);
> return 0;
> }
> -
> - SetPageReferenced(head);
> - return 1;
> + return __huge_pt_done(head, refs, nr);
> }
>
> static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> @@ -2114,30 +2116,18 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
> pages, nr);
> }
>
> - refs = 0;
> page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
> - do {
> - pages[*nr] = page;
> - (*nr)++;
> - page++;
> - refs++;
> - } while (addr += PAGE_SIZE, addr != end);
> + refs = __record_subpages(page, addr, end, pages, *nr);
>
> head = try_get_compound_head(pud_page(orig), refs);
> - if (!head) {
> - *nr -= refs;
> + if (!head)
> return 0;
> - }
>
> if (unlikely(pud_val(orig) != pud_val(*pudp))) {
> - *nr -= refs;
> - while (refs--)
> - put_page(head);
> + __remove_refs_from_head(head, refs);
> return 0;
> }
> -
> - SetPageReferenced(head);
> - return 1;
> + return __huge_pt_done(head, refs, nr);
> }
>
> static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
> @@ -2151,30 +2141,19 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
> return 0;
>
> BUILD_BUG_ON(pgd_devmap(orig));
> - refs = 0;
> +
> page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
> - do {
> - pages[*nr] = page;
> - (*nr)++;
> - page++;
> - refs++;
> - } while (addr += PAGE_SIZE, addr != end);
> + refs = __record_subpages(page, addr, end, pages, *nr);
>
> head = try_get_compound_head(pgd_page(orig), refs);
> - if (!head) {
> - *nr -= refs;
> + if (!head)
> return 0;
> - }
>
> if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
> - *nr -= refs;
> - while (refs--)
> - put_page(head);
> + __remove_refs_from_head(head, refs);
> return 0;
> }
> -
> - SetPageReferenced(head);
> - return 1;
> + return __huge_pt_done(head, refs, nr);
> }
>
> static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
> --
> 2.23.0
>
>

2019-10-31 18:46:00

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 02/19] mm/gup: factor out duplicate code from four routines

On 10/31/19 11:35 AM, Ira Weiny wrote:
> On Wed, Oct 30, 2019 at 03:49:13PM -0700, John Hubbard wrote:
...
>> +
>> +static void __remove_refs_from_head(struct page *page, int refs)
>> +{
>> + /* Do a get_page() first, in case refs == page->_refcount */
>> + get_page(page);
>> + page_ref_sub(page, refs);
>> + put_page(page);
>> +}
>
> I wonder if this is better implemented as "put_compound_head()"? To match the
> try_get_compound_head() call below?

Hi Ira,

Good idea, I'll rename it to that.

>
>> +
>> +static int __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr)
>> +{
>> + *nr += nr_recorded_pages;
>> + SetPageReferenced(head);
>> + return 1;
>
> When will this return anything but 1?
>

Never, but it saves a line at all four call sites, by having it return like that.

I could see how maybe people would prefer to just have it be a void function,
and return 1 directly at the call sites. Since this was a lower line count I
thought maybe it would be slightly better, but it's hard to say really.

thanks,

John Hubbard
NVIDIA

2019-10-31 22:49:32

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 02/19] mm/gup: factor out duplicate code from four routines

On Thu, Oct 31, 2019 at 11:43:37AM -0700, John Hubbard wrote:
> On 10/31/19 11:35 AM, Ira Weiny wrote:
> > On Wed, Oct 30, 2019 at 03:49:13PM -0700, John Hubbard wrote:
> ...
> >> +
> >> +static void __remove_refs_from_head(struct page *page, int refs)
> >> +{
> >> + /* Do a get_page() first, in case refs == page->_refcount */
> >> + get_page(page);
> >> + page_ref_sub(page, refs);
> >> + put_page(page);
> >> +}
> >
> > I wonder if this is better implemented as "put_compound_head()"? To match the
> > try_get_compound_head() call below?
>
> Hi Ira,
>
> Good idea, I'll rename it to that.
>
> >
> >> +
> >> +static int __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr)
> >> +{
> >> + *nr += nr_recorded_pages;
> >> + SetPageReferenced(head);
> >> + return 1;
> >
> > When will this return anything but 1?
> >
>
> Never, but it saves a line at all four call sites, by having it return like that.
>
> I could see how maybe people would prefer to just have it be a void function,
> and return 1 directly at the call sites. Since this was a lower line count I
> thought maybe it would be slightly better, but it's hard to say really.

It is a NIT perhaps but I feel like the signature of a function should stand on
it's own. What this does is mix the meaning of this function with those
calling it. Which IMO is not good style.

We can see what others say.

Ira

>
> thanks,
>
> John Hubbard
> NVIDIA
>

2019-10-31 22:57:31

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 02/19] mm/gup: factor out duplicate code from four routines

On 10/31/19 2:09 PM, Ira Weiny wrote:
> On Thu, Oct 31, 2019 at 11:43:37AM -0700, John Hubbard wrote:
>> On 10/31/19 11:35 AM, Ira Weiny wrote:
>>> On Wed, Oct 30, 2019 at 03:49:13PM -0700, John Hubbard wrote:
>> ...
>>>> +
>>>> +static int __huge_pt_done(struct page *head, int nr_recorded_pages, int *nr)
>>>> +{
>>>> + *nr += nr_recorded_pages;
>>>> + SetPageReferenced(head);
>>>> + return 1;
>>>
>>> When will this return anything but 1?
>>>
>>
>> Never, but it saves a line at all four call sites, by having it return like that.
>>
>> I could see how maybe people would prefer to just have it be a void function,
>> and return 1 directly at the call sites. Since this was a lower line count I
>> thought maybe it would be slightly better, but it's hard to say really.
>
> It is a NIT perhaps but I feel like the signature of a function should stand on
> it's own. What this does is mix the meaning of this function with those
> calling it. Which IMO is not good style.
>
> We can see what others say.
>

Sure. I'll plan on changing it to a void return type, then, unless someone else
pipes up.


thanks,

John Hubbard
NVIDIA

2019-10-31 23:39:23

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast()

On Wed, Oct 30, 2019 at 03:49:20PM -0700, John Hubbard wrote:
> Convert drm/via to use the new pin_user_pages_fast() call, which sets
> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> tracking of pinned pages, and therefore for any code that calls
> put_user_page().
>

Reviewed-by: Ira Weiny <[email protected]>

> Signed-off-by: John Hubbard <[email protected]>
> ---
> drivers/gpu/drm/via/via_dmablit.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
> index 3db000aacd26..37c5e572993a 100644
> --- a/drivers/gpu/drm/via/via_dmablit.c
> +++ b/drivers/gpu/drm/via/via_dmablit.c
> @@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg, drm_via_dmablit_t *xfer)
> vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages));
> if (NULL == vsg->pages)
> return -ENOMEM;
> - ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
> + ret = pin_user_pages_fast((unsigned long)xfer->mem_addr,
> vsg->num_pages,
> vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
> vsg->pages);
> --
> 2.23.0
>

2019-10-31 23:43:52

by Ira Weiny

[permalink] [raw]
Subject: Re: [PATCH 11/19] net/xdp: set FOLL_PIN via pin_user_pages()

On Wed, Oct 30, 2019 at 03:49:22PM -0700, John Hubbard wrote:
> Convert net/xdp to use the new pin_longterm_pages() call, which sets
> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> tracking of pinned pages.
>

Reviewed-by: Ira Weiny <[email protected]>

> Signed-off-by: John Hubbard <[email protected]>
> ---
> net/xdp/xdp_umem.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 16d5f353163a..4d56dfb1139a 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -285,8 +285,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
> return -ENOMEM;
>
> down_read(&current->mm->mmap_sem);
> - npgs = get_user_pages(umem->address, umem->npgs,
> - gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
> + npgs = pin_longterm_pages(umem->address, umem->npgs, gup_flags,
> + &umem->pgs[0], NULL);
> up_read(&current->mm->mmap_sem);
>
> if (npgs != umem->npgs) {
> --
> 2.23.0
>

2019-11-02 11:05:15

by Björn Töpel

[permalink] [raw]
Subject: Re: [PATCH 11/19] net/xdp: set FOLL_PIN via pin_user_pages()

On 2019-10-30 23:49, John Hubbard wrote:
> Convert net/xdp to use the new pin_longterm_pages() call, which sets
> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> tracking of pinned pages.
>
> Signed-off-by: John Hubbard <[email protected]>

Acked-by: Björn Töpel <[email protected]>

> ---
> net/xdp/xdp_umem.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
> index 16d5f353163a..4d56dfb1139a 100644
> --- a/net/xdp/xdp_umem.c
> +++ b/net/xdp/xdp_umem.c
> @@ -285,8 +285,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
> return -ENOMEM;
>
> down_read(&current->mm->mmap_sem);
> - npgs = get_user_pages(umem->address, umem->npgs,
> - gup_flags | FOLL_LONGTERM, &umem->pgs[0], NULL);
> + npgs = pin_longterm_pages(umem->address, umem->npgs, gup_flags,
> + &umem->pgs[0], NULL);
> up_read(&current->mm->mmap_sem);
>
> if (npgs != umem->npgs) {
>

2019-11-04 18:14:04

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast()

On Thu, Oct 31, 2019 at 04:36:28PM -0700, Ira Weiny wrote:
> On Wed, Oct 30, 2019 at 03:49:20PM -0700, John Hubbard wrote:
> > Convert drm/via to use the new pin_user_pages_fast() call, which sets
> > FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> > tracking of pinned pages, and therefore for any code that calls
> > put_user_page().
> >
>
> Reviewed-by: Ira Weiny <[email protected]>

No one's touching the via driver anymore, so feel free to merge this
through whatever tree suits best (aka I'll drop this on the floor and
forget about it now).

Acked-by: Daniel Vetter <[email protected]>

>
> > Signed-off-by: John Hubbard <[email protected]>
> > ---
> > drivers/gpu/drm/via/via_dmablit.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/via/via_dmablit.c b/drivers/gpu/drm/via/via_dmablit.c
> > index 3db000aacd26..37c5e572993a 100644
> > --- a/drivers/gpu/drm/via/via_dmablit.c
> > +++ b/drivers/gpu/drm/via/via_dmablit.c
> > @@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg, drm_via_dmablit_t *xfer)
> > vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages));
> > if (NULL == vsg->pages)
> > return -ENOMEM;
> > - ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
> > + ret = pin_user_pages_fast((unsigned long)xfer->mem_addr,
> > vsg->num_pages,
> > vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
> > vsg->pages);
> > --
> > 2.23.0
> >

--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2019-11-04 19:23:40

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast()

On 11/4/19 10:10 AM, Daniel Vetter wrote:
> On Thu, Oct 31, 2019 at 04:36:28PM -0700, Ira Weiny wrote:
>> On Wed, Oct 30, 2019 at 03:49:20PM -0700, John Hubbard wrote:
>>> Convert drm/via to use the new pin_user_pages_fast() call, which sets
>>> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
>>> tracking of pinned pages, and therefore for any code that calls
>>> put_user_page().
>>>
>>
>> Reviewed-by: Ira Weiny <[email protected]>
>
> No one's touching the via driver anymore, so feel free to merge this
> through whatever tree suits best (aka I'll drop this on the floor and
> forget about it now).
>
> Acked-by: Daniel Vetter <[email protected]>
>

OK, great. Yes, in fact, I'm hoping Andrew can just push the whole series
in through the mm tree, because that would allow it to be done in one
shot, in 5.5


thanks,

John Hubbard
NVIDIA

2019-11-05 09:53:36

by Daniel Vetter

[permalink] [raw]
Subject: Re: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast()

On Mon, Nov 04, 2019 at 11:20:38AM -0800, John Hubbard wrote:
> On 11/4/19 10:10 AM, Daniel Vetter wrote:
> > On Thu, Oct 31, 2019 at 04:36:28PM -0700, Ira Weiny wrote:
> >> On Wed, Oct 30, 2019 at 03:49:20PM -0700, John Hubbard wrote:
> >>> Convert drm/via to use the new pin_user_pages_fast() call, which sets
> >>> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
> >>> tracking of pinned pages, and therefore for any code that calls
> >>> put_user_page().
> >>>
> >>
> >> Reviewed-by: Ira Weiny <[email protected]>
> >
> > No one's touching the via driver anymore, so feel free to merge this
> > through whatever tree suits best (aka I'll drop this on the floor and
> > forget about it now).
> >
> > Acked-by: Daniel Vetter <[email protected]>
> >
>
> OK, great. Yes, in fact, I'm hoping Andrew can just push the whole series
> in through the mm tree, because that would allow it to be done in one
> shot, in 5.5

btw is there more? We should have a bunch more userptr stuff in various
drivers, so was really surprised that drm/via is the only thing in your
series.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

2019-11-05 18:17:34

by John Hubbard

[permalink] [raw]
Subject: Re: [PATCH 09/19] drm/via: set FOLL_PIN via pin_user_pages_fast()

On 11/5/19 1:49 AM, Daniel Vetter wrote:
> On Mon, Nov 04, 2019 at 11:20:38AM -0800, John Hubbard wrote:
>> On 11/4/19 10:10 AM, Daniel Vetter wrote:
>>> On Thu, Oct 31, 2019 at 04:36:28PM -0700, Ira Weiny wrote:
>>>> On Wed, Oct 30, 2019 at 03:49:20PM -0700, John Hubbard wrote:
>>>>> Convert drm/via to use the new pin_user_pages_fast() call, which sets
>>>>> FOLL_PIN. Setting FOLL_PIN is now required for code that requires
>>>>> tracking of pinned pages, and therefore for any code that calls
>>>>> put_user_page().
>>>>>
>>>>
>>>> Reviewed-by: Ira Weiny <[email protected]>
>>>
>>> No one's touching the via driver anymore, so feel free to merge this
>>> through whatever tree suits best (aka I'll drop this on the floor and
>>> forget about it now).
>>>
>>> Acked-by: Daniel Vetter <[email protected]>
>>>
>>
>> OK, great. Yes, in fact, I'm hoping Andrew can just push the whole series
>> in through the mm tree, because that would allow it to be done in one
>> shot, in 5.5
>
> btw is there more? We should have a bunch more userptr stuff in various
> drivers, so was really surprised that drm/via is the only thing in your
> series.


There is more, but:

1) Fortunately, the opt-in nature of FOLL_PIN allows converting a few call
sites at a time. And so this patchset limits itself to converting the bare
minimum required to get started, which is:

a) calls sites that have already been converted to put_user_page(),
and

b) call sites that set FOLL_LONGTERM.

So yes, follow-up patches will be required. This is not everything.
In fact, if I can fix this series up quickly enough that it makes it into
mmotm soon-ish, then there may be time to get some follow-patches on top
of it, in time for 5.5.


2) If I recall correctly, Jerome and maybe others are working to remove
as many get_user_pages() callers from drm as possible, and instead use
a non-pinned page approach, with mmu notifiers instead. I'm not sure of
the exact status of that work, but I see that etnaviv, amdgpu, i915, and
radeon still call gup() in linux-next.

Anyway, some of those call sites will disappear. Although I'd expect a
few to remain, because I doubt the simpler GPUs can support page faulting.



thanks,

John Hubbard
NVIDIA