v5 -> v6:
rebase code to latest linux-mm git tree
improve commit log
[3/3], rename GUP_FAST_LONGTERM_BENCHMARK as PIN_FAST_LONGTERM_BENCHMARK due
to LONGTERM should be a special case of pin page.
Cc: Ira Weiny <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
To: [email protected]
Cc: [email protected]
Pingfan Liu (3):
mm/gup: rename nr as nr_pinned in get_user_pages_fast()
mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path
mm/gup_benchemark: add LONGTERM_BENCHMARK test in gup fast path
mm/gup.c | 29 +++++++++++++++++++----------
mm/gup_benchmark.c | 7 +++++++
tools/testing/selftests/vm/gup_benchmark.c | 6 +++++-
3 files changed, 31 insertions(+), 11 deletions(-)
--
2.7.5
To better reflect the held state of pages and make code self-explaining,
rename nr as nr_pinned.
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Shuah Khan <[email protected]>
To: [email protected]
Cc: [email protected]
---
mm/gup.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index e8aaa40..9df77b1 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2735,7 +2735,7 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
struct page **pages)
{
unsigned long addr, len, end;
- int nr = 0, ret = 0;
+ int nr_pinned = 0, ret = 0;
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
FOLL_FORCE | FOLL_PIN | FOLL_GET)))
@@ -2754,25 +2754,25 @@ static int internal_get_user_pages_fast(unsigned long start, int nr_pages,
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
gup_fast_permitted(start, end)) {
local_irq_disable();
- gup_pgd_range(addr, end, gup_flags, pages, &nr);
+ gup_pgd_range(addr, end, gup_flags, pages, &nr_pinned);
local_irq_enable();
- ret = nr;
+ ret = nr_pinned;
}
- if (nr < nr_pages) {
+ if (nr_pinned < nr_pages) {
/* Try to get the remaining pages with get_user_pages */
- start += nr << PAGE_SHIFT;
- pages += nr;
+ start += nr_pinned << PAGE_SHIFT;
+ pages += nr_pinned;
- ret = __gup_longterm_unlocked(start, nr_pages - nr,
+ ret = __gup_longterm_unlocked(start, nr_pages - nr_pinned,
gup_flags, pages);
/* Have to be a bit careful with return values */
- if (nr > 0) {
+ if (nr_pinned > 0) {
if (ret < 0)
- ret = nr;
+ ret = nr_pinned;
else
- ret += nr;
+ ret += nr_pinned;
}
}
--
2.7.5
FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is
going to be given to hardware and can't move. It would truncate CMA
permanently and should be excluded.
In gup slow path, slow path, where
__gup_longterm_locked->check_and_migrate_cma_pages() handles FOLL_LONGTERM,
but in fast path, there lacks such a check, which means a possible leak of
CMA page to longterm pinned.
Place a check in try_grab_compound_head() in the fast path to fix the leak,
and if FOLL_LONGTERM happens on CMA, it will fall back to slow path to
migrate the page.
Some note about the check:
Huge page's subpages have the same migrate type due to either
allocation from a free_list[] or alloc_contig_range() with param
MIGRATE_MOVABLE. So it is enough to check on a single subpage
by is_migrate_cma_page(subpage)
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
To: [email protected]
Cc: [email protected]
---
mm/gup.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c
index 9df77b1..78132cf 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -89,6 +89,15 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
int orig_refs = refs;
/*
+ * Huge page's subpages have the same migrate type due to either
+ * allocation from a free_list[] or alloc_contig_range() with
+ * param MIGRATE_MOVABLE. So it is enough to check on a subpage.
+ */
+ if (unlikely(flags & FOLL_LONGTERM) &&
+ is_migrate_cma_page(page))
+ return NULL;
+
+ /*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
* track it, via hpage_pincount_add/_sub().
--
2.7.5
Introduce a PIN_FAST_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
path.
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Alexander Duyck <[email protected]>
To: [email protected]
Cc: [email protected]
---
mm/gup_benchmark.c | 7 +++++++
tools/testing/selftests/vm/gup_benchmark.c | 6 +++++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index be690fa..09fba20 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -10,6 +10,7 @@
#define GUP_BENCHMARK _IOWR('g', 3, struct gup_benchmark)
#define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark)
#define PIN_BENCHMARK _IOWR('g', 5, struct gup_benchmark)
+#define PIN_FAST_LONGTERM_BENCHMARK _IOWR('g', 6, struct gup_benchmark)
struct gup_benchmark {
__u64 get_delta_usec;
@@ -101,6 +102,11 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
nr = get_user_pages_fast(addr, nr, gup->flags,
pages + i);
break;
+ case PIN_FAST_LONGTERM_BENCHMARK:
+ nr = get_user_pages_fast(addr, nr,
+ gup->flags | FOLL_LONGTERM,
+ pages + i);
+ break;
case GUP_LONGTERM_BENCHMARK:
nr = get_user_pages(addr, nr,
gup->flags | FOLL_LONGTERM,
@@ -166,6 +172,7 @@ static long gup_benchmark_ioctl(struct file *filep, unsigned int cmd,
case GUP_BENCHMARK:
case PIN_FAST_BENCHMARK:
case PIN_BENCHMARK:
+ case PIN_FAST_LONGTERM_BENCHMARK:
break;
default:
return -EINVAL;
diff --git a/tools/testing/selftests/vm/gup_benchmark.c b/tools/testing/selftests/vm/gup_benchmark.c
index 43b4dfe..f024ff3 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -21,6 +21,7 @@
/* Similar to above, but use FOLL_PIN instead of FOLL_GET. */
#define PIN_FAST_BENCHMARK _IOWR('g', 4, struct gup_benchmark)
#define PIN_BENCHMARK _IOWR('g', 5, struct gup_benchmark)
+#define PIN_FAST_LONGTERM_BENCHMARK _IOWR('g', 6, struct gup_benchmark)
/* Just the flags we need, copied from mm.h: */
#define FOLL_WRITE 0x01 /* check pte is writable */
@@ -44,7 +45,7 @@ int main(int argc, char **argv)
char *file = "/dev/zero";
char *p;
- while ((opt = getopt(argc, argv, "m:r:n:f:abtTLUuwSH")) != -1) {
+ while ((opt = getopt(argc, argv, "m:r:n:f:abtTlLUuwSH")) != -1) {
switch (opt) {
case 'a':
cmd = PIN_FAST_BENCHMARK;
@@ -67,6 +68,9 @@ int main(int argc, char **argv)
case 'T':
thp = 0;
break;
+ case 'l':
+ cmd = PIN_FAST_LONGTERM_BENCHMARK;
+ break;
case 'L':
cmd = GUP_LONGTERM_BENCHMARK;
break;
--
2.7.5
On Mon, Mar 16, 2020 at 12:34:02PM +0800, Pingfan Liu wrote:
> To better reflect the held state of pages and make code self-explaining,
> rename nr as nr_pinned.
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Mon, Mar 16, 2020 at 12:34:03PM +0800, Pingfan Liu wrote:
> FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is
> going to be given to hardware and can't move. It would truncate CMA
> permanently and should be excluded.
>
> In gup slow path, slow path, where
> __gup_longterm_locked->check_and_migrate_cma_pages() handles FOLL_LONGTERM,
> but in fast path, there lacks such a check, which means a possible leak of
> CMA page to longterm pinned.
>
> Place a check in try_grab_compound_head() in the fast path to fix the leak,
> and if FOLL_LONGTERM happens on CMA, it will fall back to slow path to
> migrate the page.
>
> Some note about the check:
> Huge page's subpages have the same migrate type due to either
> allocation from a free_list[] or alloc_contig_range() with param
> MIGRATE_MOVABLE. So it is enough to check on a single subpage
> by is_migrate_cma_page(subpage)
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: "Aneesh Kumar K.V" <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> To: [email protected]
> Cc: [email protected]
> ---
> mm/gup.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index 9df77b1..78132cf 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -89,6 +89,15 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
> int orig_refs = refs;
>
> /*
> + * Huge page's subpages have the same migrate type due to either
> + * allocation from a free_list[] or alloc_contig_range() with
> + * param MIGRATE_MOVABLE. So it is enough to check on a subpage.
> + */
> + if (unlikely(flags & FOLL_LONGTERM) &&
> + is_migrate_cma_page(page))
> + return NULL;
Wrong indentation. We either align two tabs for continuation
statements, or after the opening brace of the previous line. With a
likely or unlikely thrown in the former IMHO looks much better. E.g.
if (unlikely(flags & FOLL_LONGTERM) &&
is_migrate_cma_page(page))
return NULL;
On Mon, Mar 16, 2020 at 4:55 PM Christoph Hellwig <[email protected]> wrote:
>
[...]
>
> Wrong indentation. We either align two tabs for continuation
> statements, or after the opening brace of the previous line. With a
> likely or unlikely thrown in the former IMHO looks much better. E.g.
>
> if (unlikely(flags & FOLL_LONGTERM) &&
> is_migrate_cma_page(page))
> return NULL;
>
OK, thanks. I will send out V7 to fix it.
Regards,
Pingfan
FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is
going to be given to hardware and can't move. It would truncate CMA
permanently and should be excluded.
In gup slow path, slow path, where
__gup_longterm_locked->check_and_migrate_cma_pages() handles FOLL_LONGTERM,
but in fast path, there lacks such a check, which means a possible leak of
CMA page to longterm pinned.
Place a check in try_grab_compound_head() in the fast path to fix the leak,
and if FOLL_LONGTERM happens on CMA, it will fall back to slow path to
migrate the page.
Some note about the check:
Huge page's subpages have the same migrate type due to either
allocation from a free_list[] or alloc_contig_range() with param
MIGRATE_MOVABLE. So it is enough to check on a single subpage
by is_migrate_cma_page(subpage)
Signed-off-by: Pingfan Liu <[email protected]>
Cc: Ira Weiny <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: John Hubbard <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
To: [email protected]
Cc: [email protected]
---
v6 -> v7: fix coding style issue
mm/gup.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/mm/gup.c b/mm/gup.c
index 9df77b1..0a536d7 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -89,6 +89,15 @@ static __maybe_unused struct page *try_grab_compound_head(struct page *page,
int orig_refs = refs;
/*
+ * Huge page's subpages have the same migrate type due to either
+ * allocation from a free_list[] or alloc_contig_range() with
+ * param MIGRATE_MOVABLE. So it is enough to check on a subpage.
+ */
+ if (unlikely(flags & FOLL_LONGTERM) &&
+ is_migrate_cma_page(page))
+ return NULL;
+
+ /*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
* track it, via hpage_pincount_add/_sub().
--
2.7.5
On Tue, Mar 17, 2020 at 07:47:32PM +0800, Pingfan Liu wrote:
> FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is
> going to be given to hardware and can't move. It would truncate CMA
> permanently and should be excluded.
>
> In gup slow path, slow path, where
> __gup_longterm_locked->check_and_migrate_cma_pages() handles FOLL_LONGTERM,
> but in fast path, there lacks such a check, which means a possible leak of
> CMA page to longterm pinned.
>
> Place a check in try_grab_compound_head() in the fast path to fix the leak,
> and if FOLL_LONGTERM happens on CMA, it will fall back to slow path to
> migrate the page.
>
> Some note about the check:
> Huge page's subpages have the same migrate type due to either
> allocation from a free_list[] or alloc_contig_range() with param
> MIGRATE_MOVABLE. So it is enough to check on a single subpage
> by is_migrate_cma_page(subpage)
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Tue, Mar 17, 2020 at 07:47:32PM +0800, Pingfan Liu wrote:
> FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is
> going to be given to hardware and can't move. It would truncate CMA
> permanently and should be excluded.
>
> In gup slow path, slow path, where
> __gup_longterm_locked->check_and_migrate_cma_pages() handles FOLL_LONGTERM,
> but in fast path, there lacks such a check, which means a possible leak of
> CMA page to longterm pinned.
>
> Place a check in try_grab_compound_head() in the fast path to fix the leak,
> and if FOLL_LONGTERM happens on CMA, it will fall back to slow path to
> migrate the page.
>
> Some note about the check:
> Huge page's subpages have the same migrate type due to either
> allocation from a free_list[] or alloc_contig_range() with param
> MIGRATE_MOVABLE. So it is enough to check on a single subpage
> by is_migrate_cma_page(subpage)
>
> Signed-off-by: Pingfan Liu <[email protected]>
> Cc: Ira Weiny <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> Cc: Dan Williams <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> Cc: John Hubbard <[email protected]>
> Cc: "Aneesh Kumar K.V" <[email protected]>
> Cc: Christoph Hellwig <[email protected]>
> Cc: Shuah Khan <[email protected]>
> Cc: Jason Gunthorpe <[email protected]>
> To: [email protected]
> Cc: [email protected]
> ---
> v6 -> v7: fix coding style issue
> mm/gup.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
Much clearer, thank you
Reviewed-by: Jason Gunthorpe <[email protected]>
Jason
On Fri, Mar 20, 2020 at 6:27 AM John Hubbard <[email protected]> wrote:
>
> On 3/15/20 9:34 PM, Pingfan Liu wrote:
> > Introduce a PIN_FAST_LONGTERM_BENCHMARK ioctl to test longterm pin in gup fast
> > path.
>
> 1. The subject line still has "benchemark", which should be "benchmark".
>
> 2. What do you really want to test? More about the use case to be tested would help.
> Just another sentence. I wouldn't normally ask, but in this case the implementation
> is slightly scrambled, and I can't suggest how to fix it because there's no information
> in the commit log as to the use case, nor the motivation.
Oh, the history about [3/3] is to verify the implementation method of
[2/3]. Please refer to
https://lore.kernel.org/linux-mm/[email protected]/
Cite "
> > I think the concern is: for the successful gup_fast case with no CMA
> > pages, this patch is adding another complete loop through all the
> > pages. In the fast case.
> >
> > If the check were instead done as part of the gup_pte_range(), then
> > it would be a little more efficient for that case.
> >
> > As for whether it's worth it, *probably* this is too small an effect to measure.
> > But in order to attempt a measurement: running fio (https://github.com/axboe/fio)
> > with O_DIRECT on an NVMe drive, might shed some light. Here's an fio.conf file
> > that Jan Kara and Tom Talpey helped me come up with, for related testing:
"
But I think now, there is no motivation for it, and can be dropped it now.
Thanks,
Pingfan
On Fri, Mar 20, 2020 at 6:17 AM John Hubbard <[email protected]> wrote:
>
> On 3/17/20 4:47 AM, Pingfan Liu wrote:
> > FOLL_LONGTERM is a special case of FOLL_PIN. It suggests a pin which is
> > going to be given to hardware and can't move. It would truncate CMA
> > permanently and should be excluded.
> >
> > In gup slow path, slow path, where
>
>
> s/slow path, slow path/slow path/
Yeah.
[...]
> >
> > /*
> > + * Huge page's subpages have the same migrate type due to either
> > + * allocation from a free_list[] or alloc_contig_range() with
> > + * param MIGRATE_MOVABLE. So it is enough to check on a subpage.
> > + */
>
> Urggh, this comment is fine in the commit description, but at this location in the
> code it is completely incomprehensible! Instead of an extremely far-removed tidbit about
> interactions between CMA and huge pages, this comment should be explaining why we bail
> out early in the specific case of FOLL_PIN + FOLL_LONGTERM. And we don't bail out for
> FOLL_GET + FOLL_LONGTERM...
>
>
> I'm expect it is something like:
>
> /*
> * We can't do FOLL_LONGTERM + FOLL_PIN with CMA in the gup fast
> * path, so fail and let the caller fall back to the slow path.
> */
>
>
> ...approximately. Right?
Yes, right. And I think it is better to drop "We".
Thanks,
Pingfan