2018-11-20 16:00:50

by Michal Hocko

[permalink] [raw]
Subject: [RFC PATCH 0/3] few memory offlining enhancements

I have been chasing memory offlining not making progress recently. On
the way I have noticed few weird decisions in the code. The migration
itself is restricted without a reasonable justification and the retry
loop around the migration is quite messy. This is addressed by patch 1
and patch 2.

Patch 3 is targeting on the faultaround code which has been a hot
candidate for the initial issue reported upstream [1] and that I am
debugging internally. It turned out to be not the main contributor
in the end but I believe we should address it regardless. See the patch
description for more details.

[1] http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv




2018-11-20 13:44:48

by Michal Hocko

[permalink] [raw]
Subject: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining

From: Michal Hocko <[email protected]>

Memory migration might fail during offlining and we keep retrying in
that case. This is currently obfuscate by goto retry loop. The code
is hard to follow and as a result it is even suboptimal becase each
retry round scans the full range from start_pfn even though we have
successfully scanned/migrated [start_pfn, pfn] range already. This
is all only because check_pages_isolated failure has to rescan the full
range again.

De-obfuscate the migration retry loop by promoting it to a real for
loop. In fact remove the goto altogether by making it a proper double
loop (yeah, gotos are nasty in this specific case). In the end we
will get a slightly more optimal code which is better readable.

Signed-off-by: Michal Hocko <[email protected]>
---
mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6263c8cd4491..9cd161db3061 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1591,38 +1591,40 @@ static int __ref __offline_pages(unsigned long start_pfn,
goto failed_removal_isolated;
}

- pfn = start_pfn;
-repeat:
- /* start memory hot removal */
- ret = -EINTR;
- if (signal_pending(current)) {
- reason = "signal backoff";
- goto failed_removal_isolated;
- }
+ do {
+ for (pfn = start_pfn; pfn;)
+ {
+ /* start memory hot removal */
+ ret = -EINTR;
+ if (signal_pending(current)) {
+ reason = "signal backoff";
+ goto failed_removal_isolated;
+ }

- cond_resched();
- lru_add_drain_all();
- drain_all_pages(zone);
+ cond_resched();
+ lru_add_drain_all();
+ drain_all_pages(zone);

- pfn = scan_movable_pages(start_pfn, end_pfn);
- if (pfn) { /* We have movable pages */
- ret = do_migrate_range(pfn, end_pfn);
- goto repeat;
- }
+ pfn = scan_movable_pages(pfn, end_pfn);
+ if (pfn) {
+ /* TODO fatal migration failures should bail out */
+ do_migrate_range(pfn, end_pfn);
+ }
+ }
+
+ /*
+ * dissolve free hugepages in the memory block before doing offlining
+ * actually in order to make hugetlbfs's object counting consistent.
+ */
+ ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+ if (ret) {
+ reason = "failure to dissolve huge pages";
+ goto failed_removal_isolated;
+ }
+ /* check again */
+ offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+ } while (offlined_pages < 0);

- /*
- * dissolve free hugepages in the memory block before doing offlining
- * actually in order to make hugetlbfs's object counting consistent.
- */
- ret = dissolve_free_huge_pages(start_pfn, end_pfn);
- if (ret) {
- reason = "failure to dissolve huge pages";
- goto failed_removal_isolated;
- }
- /* check again */
- offlined_pages = check_pages_isolated(start_pfn, end_pfn);
- if (offlined_pages < 0)
- goto repeat;
pr_info("Offlined Pages %ld\n", offlined_pages);
/* Ok, all of our target is isolated.
We cannot do rollback at this point. */
--
2.19.1


2018-11-20 13:45:11

by Michal Hocko

[permalink] [raw]
Subject: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

From: Michal Hocko <[email protected]>

filemap_map_pages takes a speculative reference to each page in the
range before it tries to lock that page. While this is correct it
also can influence page migration which will bail out when seeing
an elevated reference count. The faultaround code would bail on
seeing a locked page so we can pro-actively check the PageLocked
bit before page_cache_get_speculative and prevent from pointless
reference count churn.

Cc: "Kirill A. Shutemov" <[email protected]>
Suggested-by: Jan Kara <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/filemap.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8ee02c..c76d6a251770 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
goto next;

head = compound_head(page);
+
+ if (PageLocked(head))
+ goto next;
if (!page_cache_get_speculative(head))
goto next;

--
2.19.1


2018-11-20 13:45:29

by Michal Hocko

[permalink] [raw]
Subject: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

From: Michal Hocko <[email protected]>

do_migrate_range has been limiting the number of pages to migrate to 256
for some reason which is not documented. Even if the limit made some
sense back then when it was introduced it doesn't really serve a good
purpose these days. If the range contains huge pages then
we break out of the loop too early and go through LRU and pcp
caches draining and scan_movable_pages is quite suboptimal.

The only reason to limit the number of pages I can think of is to reduce
the potential time to react on the fatal signal. But even then the
number of pages is a questionable metric because even a single page
might migration block in a non-killable state (e.g. __unmap_and_move).

Remove the limit and offline the full requested range (this is one
membblock worth of pages with the current code). Should we ever get a
report that offlining takes too long to react on fatal signal then we
should rather fix the core migration to use killable waits and bailout
on a signal.

Signed-off-by: Michal Hocko <[email protected]>
---
mm/memory_hotplug.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c82193db4be6..6263c8cd4491 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
return new_page_nodemask(page, nid, &nmask);
}

-#define NR_OFFLINE_AT_ONCE_PAGES (256)
static int
do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
{
unsigned long pfn;
struct page *page;
- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
int not_managed = 0;
int ret = 0;
LIST_HEAD(source);

- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+ for (pfn = start_pfn; pfn < end_pfn; pfn++) {
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
ret = -EBUSY;
break;
}
- if (isolate_huge_page(page, &source))
- move_pages -= 1 << compound_order(head);
+ isolate_huge_page(page, &source);
continue;
} else if (PageTransHuge(page))
pfn = page_to_pfn(compound_head(page))
@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (!ret) { /* Success */
put_page(page);
list_add_tail(&page->lru, &source);
- move_pages--;
if (!__PageMovable(page))
inc_node_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
--
2.19.1


2018-11-20 14:09:24

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.

Looks fine to me.

But please drop a line of comment in the code. As is it might be confusing
for a reader.

--
Kirill A. Shutemov

2018-11-20 14:19:47

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> do_migrate_range has been limiting the number of pages to migrate to 256
> for some reason which is not documented. Even if the limit made some
> sense back then when it was introduced it doesn't really serve a good
> purpose these days. If the range contains huge pages then
> we break out of the loop too early and go through LRU and pcp
> caches draining and scan_movable_pages is quite suboptimal.
>
> The only reason to limit the number of pages I can think of is to reduce
> the potential time to react on the fatal signal. But even then the
> number of pages is a questionable metric because even a single page
> might migration block in a non-killable state (e.g. __unmap_and_move).
>
> Remove the limit and offline the full requested range (this is one
> membblock worth of pages with the current code). Should we ever get a
> report that offlining takes too long to react on fatal signal then we
> should rather fix the core migration to use killable waits and bailout
> on a signal.
>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/memory_hotplug.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c82193db4be6..6263c8cd4491 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
> return new_page_nodemask(page, nid, &nmask);
> }
>
> -#define NR_OFFLINE_AT_ONCE_PAGES (256)
> static int
> do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> {
> unsigned long pfn;
> struct page *page;
> - int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> int not_managed = 0;
> int ret = 0;
> LIST_HEAD(source);
>
> - for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
> + for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> if (!pfn_valid(pfn))
> continue;
> page = pfn_to_page(pfn);
> @@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> ret = -EBUSY;
> break;
> }
> - if (isolate_huge_page(page, &source))
> - move_pages -= 1 << compound_order(head);
> + isolate_huge_page(page, &source);
> continue;
> } else if (PageTransHuge(page))
> pfn = page_to_pfn(compound_head(page))
> @@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> if (!ret) { /* Success */
> put_page(page);
> list_add_tail(&page->lru, &source);
> - move_pages--;
> if (!__PageMovable(page))
> inc_node_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
>

Yes, there is basically no statement why it was done that way. If it is
important, there should be one.

(we could also check for pending signals inside that function if really
required)

Reviewed-by: David Hildenbrand <[email protected]>

--

Thanks,

David / dhildenb

2018-11-20 14:26:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

On Tue 20-11-18 15:18:41, David Hildenbrand wrote:
[...]
> (we could also check for pending signals inside that function if really
> required)

do_migrate_pages is not the proper layer to check signals. Because the
loop only isolates pages and that is not expensive. The most expensive
part is deeper down in the migration core. We wait for page lock or
writeback and that can take a long. None of that is killable wait which
is a larger surgery but something that we should consider should there
be any need to address this.

> Reviewed-by: David Hildenbrand <[email protected]>

Thanks!
--
Michal Hocko
SUSE Labs

2018-11-20 14:27:18

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue 20-11-18 17:17:00, Kirill A. Shutemov wrote:
> On Tue, Nov 20, 2018 at 03:12:07PM +0100, Michal Hocko wrote:
> > On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> > > On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > > > From: Michal Hocko <[email protected]>
> > > >
> > > > filemap_map_pages takes a speculative reference to each page in the
> > > > range before it tries to lock that page. While this is correct it
> > > > also can influence page migration which will bail out when seeing
> > > > an elevated reference count. The faultaround code would bail on
> > > > seeing a locked page so we can pro-actively check the PageLocked
> > > > bit before page_cache_get_speculative and prevent from pointless
> > > > reference count churn.
> > >
> > > Looks fine to me.
> >
> > Thanks for the review.
> >
> > > But please drop a line of comment in the code. As is it might be confusing
> > > for a reader.
> >
> > This?
>
> Yep.
>
> Acked-by: Kirill A. Shutemov <[email protected]>

Cool, thanks! I will wait some more time for review feedback for other
patches and then repost with this folded in.

--
Michal Hocko
SUSE Labs

2018-11-20 14:27:49

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining

On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> Memory migration might fail during offlining and we keep retrying in
> that case. This is currently obfuscate by goto retry loop. The code
> is hard to follow and as a result it is even suboptimal becase each
> retry round scans the full range from start_pfn even though we have
> successfully scanned/migrated [start_pfn, pfn] range already. This
> is all only because check_pages_isolated failure has to rescan the full
> range again.
>
> De-obfuscate the migration retry loop by promoting it to a real for
> loop. In fact remove the goto altogether by making it a proper double
> loop (yeah, gotos are nasty in this specific case). In the end we
> will get a slightly more optimal code which is better readable.
>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/memory_hotplug.c | 60 +++++++++++++++++++++++----------------------
> 1 file changed, 31 insertions(+), 29 deletions(-)
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 6263c8cd4491..9cd161db3061 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1591,38 +1591,40 @@ static int __ref __offline_pages(unsigned long start_pfn,
> goto failed_removal_isolated;
> }
>
> - pfn = start_pfn;
> -repeat:
> - /* start memory hot removal */
> - ret = -EINTR;
> - if (signal_pending(current)) {
> - reason = "signal backoff";
> - goto failed_removal_isolated;
> - }
> + do {
> + for (pfn = start_pfn; pfn;)
> + {

{ on a new line looks weird.

> + /* start memory hot removal */
> + ret = -EINTR;

I think we can move that into the "if (signal_pending(current))"

(if my eyes are not wrong, this will not be touched otherwise)

> + if (signal_pending(current)) {
> + reason = "signal backoff";
> + goto failed_removal_isolated;
> + }
>
> - cond_resched();
> - lru_add_drain_all();
> - drain_all_pages(zone);
> + cond_resched();
> + lru_add_drain_all();
> + drain_all_pages(zone);
>
> - pfn = scan_movable_pages(start_pfn, end_pfn);
> - if (pfn) { /* We have movable pages */
> - ret = do_migrate_range(pfn, end_pfn);
> - goto repeat;
> - }
> + pfn = scan_movable_pages(pfn, end_pfn);
> + if (pfn) {
> + /* TODO fatal migration failures should bail out */
> + do_migrate_range(pfn, end_pfn);

Right, that return value was always ignored.

> + }
> + }
> +
> + /*
> + * dissolve free hugepages in the memory block before doing offlining
> + * actually in order to make hugetlbfs's object counting consistent.
> + */
> + ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> + if (ret) {
> + reason = "failure to dissolve huge pages";
> + goto failed_removal_isolated;
> + }
> + /* check again */
> + offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> + } while (offlined_pages < 0);
>
> - /*
> - * dissolve free hugepages in the memory block before doing offlining
> - * actually in order to make hugetlbfs's object counting consistent.
> - */
> - ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> - if (ret) {
> - reason = "failure to dissolve huge pages";
> - goto failed_removal_isolated;
> - }
> - /* check again */
> - offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> - if (offlined_pages < 0)
> - goto repeat;
> pr_info("Offlined Pages %ld\n", offlined_pages);
> /* Ok, all of our target is isolated.
> We cannot do rollback at this point. */
>

Looks much better to me.


--

Thanks,

David / dhildenb

2018-11-20 15:04:03

by Pasha Tatashin

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

On 18-11-20 14:43:21, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> do_migrate_range has been limiting the number of pages to migrate to 256
> for some reason which is not documented. Even if the limit made some
> sense back then when it was introduced it doesn't really serve a good
> purpose these days. If the range contains huge pages then
> we break out of the loop too early and go through LRU and pcp
> caches draining and scan_movable_pages is quite suboptimal.
>
> The only reason to limit the number of pages I can think of is to reduce
> the potential time to react on the fatal signal. But even then the
> number of pages is a questionable metric because even a single page
> might migration block in a non-killable state (e.g. __unmap_and_move).
>
> Remove the limit and offline the full requested range (this is one
> membblock worth of pages with the current code). Should we ever get a
> report that offlining takes too long to react on fatal signal then we
> should rather fix the core migration to use killable waits and bailout
> on a signal.
>
> Signed-off-by: Michal Hocko <[email protected]>

Looks good to me, I also do not see a reason for 256 pages limit.

Reviewed-by: Pavel Tatashin <[email protected]>

Added Kame to CC, who introduced page offlining, and this limit, but as
far as I can tell the last time he was active on LKML was in 2016.

Pasha

2018-11-20 15:30:40

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

On Tue 20-11-18 15:51:32, Oscar Salvador wrote:
> On Tue, 2018-11-20 at 14:43 +0100, Michal Hocko wrote:
> > From: Michal Hocko <[email protected]>
> >
> > do_migrate_range has been limiting the number of pages to migrate to
> > 256
> > for some reason which is not documented.
>
> When looking back at old memory-hotplug commits one feels pretty sad
> about the brevity of the changelogs.

Well, things evolve and we've become much more careful about changelogs
over time. It still gets quite a lot of time to push back on changelogs
even these days though. People still keep forgetting that "what" is not
as important as "why" because the former is usually quite easy to
understand from reading the diff. The intention behind is usually what
gets forgotten after years. I guess people realize this much more after
few excavation git blame tours.

> > Signed-off-by: Michal Hocko <[email protected]>
>
> Reviewed-by: Oscar Salvador <[email protected]>

Thanks!
--
Michal Hocko
SUSE Labs

2018-11-20 15:37:18

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining


> Signed-off-by: Michal Hocko <[email protected]>
[...]
> + do {
> + for (pfn = start_pfn; pfn;)
> + {
> + /* start memory hot removal */

Should we change thAT comment? I mean, this is not really the hot-
removal stage.

Maybe "start memory migration" suits better? or memory offlining?

> + ret = -EINTR;
> + if (signal_pending(current)) {
> + reason = "signal backoff";
> + goto failed_removal_isolated;
> + }
>
> - cond_resched();
> - lru_add_drain_all();
> - drain_all_pages(zone);
> + cond_resched();
> + lru_add_drain_all();
> + drain_all_pages(zone);
>
> - pfn = scan_movable_pages(start_pfn, end_pfn);
> - if (pfn) { /* We have movable pages */
> - ret = do_migrate_range(pfn, end_pfn);
> - goto repeat;
> - }
> + pfn = scan_movable_pages(pfn, end_pfn);
> + if (pfn) {
> + /* TODO fatal migration failures
> should bail out */
> + do_migrate_range(pfn, end_pfn);
> + }
> + }
> +
> + /*
> + * dissolve free hugepages in the memory block
> before doing offlining
> + * actually in order to make hugetlbfs's object
> counting consistent.
> + */
> + ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> + if (ret) {
> + reason = "failure to dissolve huge pages";
> + goto failed_removal_isolated;
> + }
> + /* check again */
> + offlined_pages = check_pages_isolated(start_pfn,
> end_pfn);
> + } while (offlined_pages < 0);
>
> - /*
> - * dissolve free hugepages in the memory block before doing
> offlining
> - * actually in order to make hugetlbfs's object counting
> consistent.
> - */
> - ret = dissolve_free_huge_pages(start_pfn, end_pfn);
> - if (ret) {
> - reason = "failure to dissolve huge pages";
> - goto failed_removal_isolated;
> - }
> - /* check again */
> - offlined_pages = check_pages_isolated(start_pfn, end_pfn);
> - if (offlined_pages < 0)
> - goto repeat;

This indeed looks much nicer and it is easier to follow.
With the changes commented by David:

Reviewed-by: Oscar Salvador <[email protected]>

2018-11-20 16:00:58

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > From: Michal Hocko <[email protected]>
> >
> > filemap_map_pages takes a speculative reference to each page in the
> > range before it tries to lock that page. While this is correct it
> > also can influence page migration which will bail out when seeing
> > an elevated reference count. The faultaround code would bail on
> > seeing a locked page so we can pro-actively check the PageLocked
> > bit before page_cache_get_speculative and prevent from pointless
> > reference count churn.
>
> Looks fine to me.

Thanks for the review.

> But please drop a line of comment in the code. As is it might be confusing
> for a reader.

This?

diff --git a/mm/filemap.c b/mm/filemap.c
index c76d6a251770..7c4e439a2e85 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2554,6 +2554,10 @@ void filemap_map_pages(struct vm_fault *vmf,

head = compound_head(page);

+ /*
+ * Check the locked pages before taking a reference to not
+ * go in the way of migration.
+ */
if (PageLocked(head))
goto next;
if (!page_cache_get_speculative(head))
--
Michal Hocko
SUSE Labs

2018-11-20 16:01:03

by Kirill A. Shutemov

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue, Nov 20, 2018 at 03:12:07PM +0100, Michal Hocko wrote:
> On Tue 20-11-18 17:07:15, Kirill A. Shutemov wrote:
> > On Tue, Nov 20, 2018 at 02:43:23PM +0100, Michal Hocko wrote:
> > > From: Michal Hocko <[email protected]>
> > >
> > > filemap_map_pages takes a speculative reference to each page in the
> > > range before it tries to lock that page. While this is correct it
> > > also can influence page migration which will bail out when seeing
> > > an elevated reference count. The faultaround code would bail on
> > > seeing a locked page so we can pro-actively check the PageLocked
> > > bit before page_cache_get_speculative and prevent from pointless
> > > reference count churn.
> >
> > Looks fine to me.
>
> Thanks for the review.
>
> > But please drop a line of comment in the code. As is it might be confusing
> > for a reader.
>
> This?

Yep.

Acked-by: Kirill A. Shutemov <[email protected]>

>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c76d6a251770..7c4e439a2e85 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2554,6 +2554,10 @@ void filemap_map_pages(struct vm_fault *vmf,
>
> head = compound_head(page);
>
> + /*
> + * Check the locked pages before taking a reference to not
> + * go in the way of migration.
> + */
> if (PageLocked(head))
> goto next;
> if (!page_cache_get_speculative(head))
> --
> Michal Hocko
> SUSE Labs

--
Kirill A. Shutemov

2018-11-20 16:01:06

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

On 20.11.18 15:25, Michal Hocko wrote:
> On Tue 20-11-18 15:18:41, David Hildenbrand wrote:
> [...]
>> (we could also check for pending signals inside that function if really
>> required)
>
> do_migrate_pages is not the proper layer to check signals. Because the
> loop only isolates pages and that is not expensive. The most expensive
> part is deeper down in the migration core. We wait for page lock or
> writeback and that can take a long. None of that is killable wait which
> is a larger surgery but something that we should consider should there
> be any need to address this.

Indeed, that makes sense.

>
>> Reviewed-by: David Hildenbrand <[email protected]>
>
> Thanks!
>


--

Thanks,

David / dhildenb

2018-11-20 16:01:36

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining

On Tue 20-11-18 15:26:43, David Hildenbrand wrote:
[...]
> > + do {
> > + for (pfn = start_pfn; pfn;)
> > + {
>
> { on a new line looks weird.
>
> > + /* start memory hot removal */
> > + ret = -EINTR;
>
> I think we can move that into the "if (signal_pending(current))"
>
> (if my eyes are not wrong, this will not be touched otherwise)

Better?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9cd161db3061..6bc3aee30f5e 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1592,11 +1592,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
}

do {
- for (pfn = start_pfn; pfn;)
- {
+ for (pfn = start_pfn; pfn;) {
/* start memory hot removal */
- ret = -EINTR;
if (signal_pending(current)) {
+ ret = -EINTR;
reason = "signal backoff";
goto failed_removal_isolated;
}
--
Michal Hocko
SUSE Labs

2018-11-20 16:01:57

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On 20.11.18 14:43, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.
>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> mm/filemap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 81adec8ee02c..c76d6a251770 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
> goto next;
>
> head = compound_head(page);
> +
> + if (PageLocked(head))
> + goto next;
> if (!page_cache_get_speculative(head))
> goto next;
>
>

Right, no sense in referencing a page if we know we will drop the
reference right away.

Reviewed-by: David Hildenbrand <[email protected]>

--

Thanks,

David / dhildenb

2018-11-20 16:06:04

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining

On Tue 20-11-18 16:13:35, Oscar Salvador wrote:
>
> > Signed-off-by: Michal Hocko <[email protected]>
> [...]
> > + do {
> > + for (pfn = start_pfn; pfn;)
> > + {
> > + /* start memory hot removal */
>
> Should we change thAT comment? I mean, this is not really the hot-
> removal stage.
>
> Maybe "start memory migration" suits better? or memory offlining?

I will just drop it. It doesn't really explain anything.
[...]
>
> This indeed looks much nicer and it is easier to follow.
> With the changes commented by David:
>
> Reviewed-by: Oscar Salvador <[email protected]>

Thanks!

--
Michal Hocko
SUSE Labs

2018-11-20 17:51:46

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 2/3] mm, memory_hotplug: deobfuscate migration part of offlining

On 20.11.18 15:34, Michal Hocko wrote:
> On Tue 20-11-18 15:26:43, David Hildenbrand wrote:
> [...]
>>> + do {
>>> + for (pfn = start_pfn; pfn;)
>>> + {
>>
>> { on a new line looks weird.
>>
>>> + /* start memory hot removal */
>>> + ret = -EINTR;
>>
>> I think we can move that into the "if (signal_pending(current))"
>>
>> (if my eyes are not wrong, this will not be touched otherwise)
>
> Better?
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9cd161db3061..6bc3aee30f5e 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1592,11 +1592,10 @@ static int __ref __offline_pages(unsigned long start_pfn,
> }
>
> do {
> - for (pfn = start_pfn; pfn;)
> - {
> + for (pfn = start_pfn; pfn;) {
> /* start memory hot removal */
> - ret = -EINTR;
> if (signal_pending(current)) {
> + ret = -EINTR;
> reason = "signal backoff";
> goto failed_removal_isolated;
> }
>

Reviewed-by: David Hildenbrand <[email protected]>

:)

--

Thanks,

David / dhildenb

2018-11-20 17:53:20

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 1/3] mm, memory_hotplug: try to migrate full section worth of pages

On Tue, 2018-11-20 at 14:43 +0100, Michal Hocko wrote:
> From: Michal Hocko <[email protected]>
>
> do_migrate_range has been limiting the number of pages to migrate to
> 256
> for some reason which is not documented.

When looking back at old memory-hotplug commits one feels pretty sad
about the brevity of the changelogs.

> Signed-off-by: Michal Hocko <[email protected]>

Reviewed-by: Oscar Salvador <[email protected]>

2018-11-21 01:55:30

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue, 20 Nov 2018, Michal Hocko wrote:

> From: Michal Hocko <[email protected]>
>
> filemap_map_pages takes a speculative reference to each page in the
> range before it tries to lock that page. While this is correct it
> also can influence page migration which will bail out when seeing
> an elevated reference count. The faultaround code would bail on
> seeing a locked page so we can pro-actively check the PageLocked
> bit before page_cache_get_speculative and prevent from pointless
> reference count churn.
>
> Cc: "Kirill A. Shutemov" <[email protected]>
> Suggested-by: Jan Kara <[email protected]>
> Signed-off-by: Michal Hocko <[email protected]>

Acked-by: Hugh Dickins <[email protected]>

though I think this patch is more useful to the avoid atomic ops,
and unnecessary dirtying of the cacheline, than to avoid the very
transient elevation of refcount, which will not affect page migration
very much.

> ---
> mm/filemap.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 81adec8ee02c..c76d6a251770 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2553,6 +2553,9 @@ void filemap_map_pages(struct vm_fault *vmf,
> goto next;
>
> head = compound_head(page);
> +
> + if (PageLocked(head))
> + goto next;
> if (!page_cache_get_speculative(head))
> goto next;
>
> --
> 2.19.1

2018-11-21 05:13:53

by William Kucharski

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page



> On Nov 20, 2018, at 7:12 AM, Michal Hocko <[email protected]> wrote:
>
> + /*
> + * Check the locked pages before taking a reference to not
> + * go in the way of migration.
> + */

Could you make this a tiny bit more explanative, something like:

+ /*
+ * Check for a locked page first, as a speculative
+ * reference may adversely influence page migration.
+ */

Reviewed-by: William Kucharski <[email protected]>

2018-11-21 07:25:46

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> On Tue, 20 Nov 2018, Michal Hocko wrote:
>
> > From: Michal Hocko <[email protected]>
> >
> > filemap_map_pages takes a speculative reference to each page in the
> > range before it tries to lock that page. While this is correct it
> > also can influence page migration which will bail out when seeing
> > an elevated reference count. The faultaround code would bail on
> > seeing a locked page so we can pro-actively check the PageLocked
> > bit before page_cache_get_speculative and prevent from pointless
> > reference count churn.
> >
> > Cc: "Kirill A. Shutemov" <[email protected]>
> > Suggested-by: Jan Kara <[email protected]>
> > Signed-off-by: Michal Hocko <[email protected]>
>
> Acked-by: Hugh Dickins <[email protected]>

Thanks!

> though I think this patch is more useful to the avoid atomic ops,
> and unnecessary dirtying of the cacheline, than to avoid the very
> transient elevation of refcount, which will not affect page migration
> very much.

Are you sure it would really be transient? In other words is it possible
that the fault around can block migration repeatedly under refault heavy
workload? I just couldn't convince myself, to be honest.
--
Michal Hocko
SUSE Labs

2018-11-21 07:32:50

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Tue 20-11-18 21:51:39, William Kucharski wrote:
>
>
> > On Nov 20, 2018, at 7:12 AM, Michal Hocko <[email protected]> wrote:
> >
> > + /*
> > + * Check the locked pages before taking a reference to not
> > + * go in the way of migration.
> > + */
>
> Could you make this a tiny bit more explanative, something like:
>
> + /*
> + * Check for a locked page first, as a speculative
> + * reference may adversely influence page migration.
> + */

sure

>
> Reviewed-by: William Kucharski <[email protected]>

Thanks!

--
Michal Hocko
SUSE Labs

2018-11-22 14:05:38

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Wed, 21 Nov 2018, Michal Hocko wrote:
> On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> > On Tue, 20 Nov 2018, Michal Hocko wrote:
> >
> > > From: Michal Hocko <[email protected]>
> > >
> > > filemap_map_pages takes a speculative reference to each page in the
> > > range before it tries to lock that page. While this is correct it
> > > also can influence page migration which will bail out when seeing
> > > an elevated reference count. The faultaround code would bail on
> > > seeing a locked page so we can pro-actively check the PageLocked
> > > bit before page_cache_get_speculative and prevent from pointless
> > > reference count churn.
> > >
> > > Cc: "Kirill A. Shutemov" <[email protected]>
> > > Suggested-by: Jan Kara <[email protected]>
> > > Signed-off-by: Michal Hocko <[email protected]>
> >
> > Acked-by: Hugh Dickins <[email protected]>
>
> Thanks!
>
> > though I think this patch is more useful to the avoid atomic ops,
> > and unnecessary dirtying of the cacheline, than to avoid the very
> > transient elevation of refcount, which will not affect page migration
> > very much.
>
> Are you sure it would really be transient? In other words is it possible
> that the fault around can block migration repeatedly under refault heavy
> workload? I just couldn't convince myself, to be honest.

I don't deny that it is possible: I expect that, using fork() (which does
not copy the ptes in a shared file vma), you can construct a test case
where each child faults one or another page near a page of no interest,
and that page of no interest is a target of migration perpetually
frustrated by filemap_map_pages()'s briefly raised refcount.

But I suggest that's a third-order effect: well worth fixing because
it's easily and uncontroversially dealt with, as you have; but not of
great importance.

The first-order effect is migration conspiring to defeat itself: that's
what my put_and_wait_on_page_locked() patch, in other thread, is about.

The second order effect is when a page that is really wanted is waited
on - the target of a fault, for which page refcount is raised maybe
long before it finally gets into the page table (whereupon it becomes
visible to try_to_unmap(), and its mapcount matches refcount so that
migration can fully account for the page). One class of that can be
well dealt with by using put_and_wait_on_page_locked_killable() in
lock_page_or_retry(), but I was keeping that as a future instalment.

But I shouldn't denigrate the transient case by referring so lightly
to migrate_pages()' 10 attempts: each of those failed attempts can
be very expensive, unmapping and TLB flushing (including IPIs) and
remapping. It may well be that 2 or 3 would be a more cost-effective
number of attempts, at least when the page is mapped.

Hugh

2018-11-23 06:32:09

by Michal Hocko

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Wed 21-11-18 18:27:11, Hugh Dickins wrote:
> On Wed, 21 Nov 2018, Michal Hocko wrote:
> > On Tue 20-11-18 17:47:21, Hugh Dickins wrote:
> > > On Tue, 20 Nov 2018, Michal Hocko wrote:
> > >
> > > > From: Michal Hocko <[email protected]>
> > > >
> > > > filemap_map_pages takes a speculative reference to each page in the
> > > > range before it tries to lock that page. While this is correct it
> > > > also can influence page migration which will bail out when seeing
> > > > an elevated reference count. The faultaround code would bail on
> > > > seeing a locked page so we can pro-actively check the PageLocked
> > > > bit before page_cache_get_speculative and prevent from pointless
> > > > reference count churn.
> > > >
> > > > Cc: "Kirill A. Shutemov" <[email protected]>
> > > > Suggested-by: Jan Kara <[email protected]>
> > > > Signed-off-by: Michal Hocko <[email protected]>
> > >
> > > Acked-by: Hugh Dickins <[email protected]>
> >
> > Thanks!
> >
> > > though I think this patch is more useful to the avoid atomic ops,
> > > and unnecessary dirtying of the cacheline, than to avoid the very
> > > transient elevation of refcount, which will not affect page migration
> > > very much.
> >
> > Are you sure it would really be transient? In other words is it possible
> > that the fault around can block migration repeatedly under refault heavy
> > workload? I just couldn't convince myself, to be honest.
>
> I don't deny that it is possible: I expect that, using fork() (which does
> not copy the ptes in a shared file vma), you can construct a test case
> where each child faults one or another page near a page of no interest,
> and that page of no interest is a target of migration perpetually
> frustrated by filemap_map_pages()'s briefly raised refcount.

The other issue I am debugging and which very likely has the same
underlying issue in the end has shown
[ 883.930477] rac1 kernel: page:ffffea2084bf5cc0 count:1889 mapcount:1887 mapping:ffff8833c82c9ad8 index:0x6b
[ 883.930485] rac1 kernel: ext4_da_aops [ext4]
[ 883.930497] rac1 kernel: name:"libc-2.22.so"
[ 883.931241] rac1 kernel: do_migrate_range done ret=23

pattern. After we have disabled the faultaround the failure has moved to
a different page but libc hasn't shown up again. This might be a matter
of (bad)luck and timing. But we thought that it is not too unlikely for
faultaround on such a shared page to strike in.

> But I suggest that's a third-order effect: well worth fixing because
> it's easily and uncontroversially dealt with, as you have; but not of
> great importance.
>
> The first-order effect is migration conspiring to defeat itself: that's
> what my put_and_wait_on_page_locked() patch, in other thread, is about.

yes. That is obviously a much more effective fix.

> The second order effect is when a page that is really wanted is waited
> on - the target of a fault, for which page refcount is raised maybe
> long before it finally gets into the page table (whereupon it becomes
> visible to try_to_unmap(), and its mapcount matches refcount so that
> migration can fully account for the page). One class of that can be
> well dealt with by using put_and_wait_on_page_locked_killable() in
> lock_page_or_retry(), but I was keeping that as a future instalment.
>
> But I shouldn't denigrate the transient case by referring so lightly
> to migrate_pages()' 10 attempts: each of those failed attempts can
> be very expensive, unmapping and TLB flushing (including IPIs) and
> remapping. It may well be that 2 or 3 would be a more cost-effective
> number of attempts, at least when the page is mapped.

If you want some update to the comment in this function or to the
changelog, I am open of course. Right now I have
+ * Check for a locked page first, as a speculative
+ * reference may adversely influence page migration.
as suggested by William.
--
Michal Hocko
SUSE Labs

2018-11-24 08:52:21

by Hugh Dickins

[permalink] [raw]
Subject: Re: [RFC PATCH 3/3] mm, fault_around: do not take a reference to a locked page

On Thu, 22 Nov 2018, Michal Hocko wrote:
>
> If you want some update to the comment in this function or to the
> changelog, I am open of course. Right now I have
> + * Check for a locked page first, as a speculative
> + * reference may adversely influence page migration.
> as suggested by William.

I ought to care, since I challenged the significance of this aspect
in the first place, but find I don't care enough - I much prefer the
patch to the comments on and in it, but have not devised any wording
that I'd prefer to see instead - sorry.

Hugh