This has been posted as an RFC [1]. There was a general agreement for
these patches. I hope I have addressed all the review feedback.
Original cover:
I have been chasing memory offlining not making progress recently. On
the way I have noticed few weird decisions in the code. The migration
itself is restricted without a reasonable justification and the retry
loop around the migration is quite messy. This is addressed by patch 1
and patch 2.
Patch 3 is targeting on the faultaround code which has been a hot
candidate for the initial issue reported upstream [2] and that I am
debugging internally. It turned out to be not the main contributor
in the end but I believe we should address it regardless. See the patch
description for more details.
[1] http://lkml.kernel.org/r/[email protected]
[2] http://lkml.kernel.org/r/20181114070909.GB2653@MiWiFi-R3L-srv
From: Michal Hocko <[email protected]>
do_migrate_range has been limiting the number of pages to migrate to 256
for some reason which is not documented. Even if the limit made some
sense back then when it was introduced it doesn't really serve a good
purpose these days. If the range contains huge pages then
we break out of the loop too early and go through LRU and pcp
caches draining and scan_movable_pages is quite suboptimal.
The only reason to limit the number of pages I can think of is to reduce
the potential time to react on the fatal signal. But even then the
number of pages is a questionable metric because even a single page
might migration block in a non-killable state (e.g. __unmap_and_move).
Remove the limit and offline the full requested range (this is one
membblock worth of pages with the current code). Should we ever get a
report that offlining takes too long to react on fatal signal then we
should rather fix the core migration to use killable waits and bailout
on a signal.
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memory_hotplug.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index c82193db4be6..6263c8cd4491 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
return new_page_nodemask(page, nid, &nmask);
}
-#define NR_OFFLINE_AT_ONCE_PAGES (256)
static int
do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
{
unsigned long pfn;
struct page *page;
- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
int not_managed = 0;
int ret = 0;
LIST_HEAD(source);
- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
+ for (pfn = start_pfn; pfn < end_pfn; pfn++) {
if (!pfn_valid(pfn))
continue;
page = pfn_to_page(pfn);
@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
ret = -EBUSY;
break;
}
- if (isolate_huge_page(page, &source))
- move_pages -= 1 << compound_order(head);
+ isolate_huge_page(page, &source);
continue;
} else if (PageTransHuge(page))
pfn = page_to_pfn(compound_head(page))
@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
if (!ret) { /* Success */
put_page(page);
list_add_tail(&page->lru, &source);
- move_pages--;
if (!__PageMovable(page))
inc_node_page_state(page, NR_ISOLATED_ANON +
page_is_file_cache(page));
--
2.19.2
From: Michal Hocko <[email protected]>
filemap_map_pages takes a speculative reference to each page in the
range before it tries to lock that page. While this is correct it
also can influence page migration which will bail out when seeing
an elevated reference count. The faultaround code would bail on
seeing a locked page so we can pro-actively check the PageLocked
bit before page_cache_get_speculative and prevent from pointless
reference count churn.
Suggested-by: Jan Kara <[email protected]>
Acked-by: Kirill A. Shutemov <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Reviewed-by: William Kucharski <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/filemap.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index 81adec8ee02c..a87f71fff879 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2553,6 +2553,13 @@ void filemap_map_pages(struct vm_fault *vmf,
goto next;
head = compound_head(page);
+
+ /*
+ * Check for a locked page first, as a speculative
+ * reference may adversely influence page migration.
+ */
+ if (PageLocked(head))
+ goto next;
if (!page_cache_get_speculative(head))
goto next;
--
2.19.2
From: Michal Hocko <[email protected]>
Memory migration might fail during offlining and we keep retrying in
that case. This is currently obfuscate by goto retry loop. The code
is hard to follow and as a result it is even suboptimal becase each
retry round scans the full range from start_pfn even though we have
successfully scanned/migrated [start_pfn, pfn] range already. This
is all only because check_pages_isolated failure has to rescan the full
range again.
De-obfuscate the migration retry loop by promoting it to a real for
loop. In fact remove the goto altogether by making it a proper double
loop (yeah, gotos are nasty in this specific case). In the end we
will get a slightly more optimal code which is better readable.
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Signed-off-by: Michal Hocko <[email protected]>
---
mm/memory_hotplug.c | 58 ++++++++++++++++++++++-----------------------
1 file changed, 29 insertions(+), 29 deletions(-)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 6263c8cd4491..c6c42a7425e5 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1591,38 +1591,38 @@ static int __ref __offline_pages(unsigned long start_pfn,
goto failed_removal_isolated;
}
- pfn = start_pfn;
-repeat:
- /* start memory hot removal */
- ret = -EINTR;
- if (signal_pending(current)) {
- reason = "signal backoff";
- goto failed_removal_isolated;
- }
+ do {
+ for (pfn = start_pfn; pfn;) {
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ reason = "signal backoff";
+ goto failed_removal_isolated;
+ }
- cond_resched();
- lru_add_drain_all();
- drain_all_pages(zone);
+ cond_resched();
+ lru_add_drain_all();
+ drain_all_pages(zone);
- pfn = scan_movable_pages(start_pfn, end_pfn);
- if (pfn) { /* We have movable pages */
- ret = do_migrate_range(pfn, end_pfn);
- goto repeat;
- }
+ pfn = scan_movable_pages(pfn, end_pfn);
+ if (pfn) {
+ /* TODO fatal migration failures should bail out */
+ do_migrate_range(pfn, end_pfn);
+ }
+ }
+
+ /*
+ * dissolve free hugepages in the memory block before doing offlining
+ * actually in order to make hugetlbfs's object counting consistent.
+ */
+ ret = dissolve_free_huge_pages(start_pfn, end_pfn);
+ if (ret) {
+ reason = "failure to dissolve huge pages";
+ goto failed_removal_isolated;
+ }
+ /* check again */
+ offlined_pages = check_pages_isolated(start_pfn, end_pfn);
+ } while (offlined_pages < 0);
- /*
- * dissolve free hugepages in the memory block before doing offlining
- * actually in order to make hugetlbfs's object counting consistent.
- */
- ret = dissolve_free_huge_pages(start_pfn, end_pfn);
- if (ret) {
- reason = "failure to dissolve huge pages";
- goto failed_removal_isolated;
- }
- /* check again */
- offlined_pages = check_pages_isolated(start_pfn, end_pfn);
- if (offlined_pages < 0)
- goto repeat;
pr_info("Offlined Pages %ld\n", offlined_pages);
/* Ok, all of our target is isolated.
We cannot do rollback at this point. */
--
2.19.2
On Tue, Dec 11, 2018 at 03:27:39PM +0100, Michal Hocko wrote:
>From: Michal Hocko <[email protected]>
>
>do_migrate_range has been limiting the number of pages to migrate to 256
>for some reason which is not documented. Even if the limit made some
>sense back then when it was introduced it doesn't really serve a good
>purpose these days. If the range contains huge pages then
>we break out of the loop too early and go through LRU and pcp
>caches draining and scan_movable_pages is quite suboptimal.
>
>The only reason to limit the number of pages I can think of is to reduce
>the potential time to react on the fatal signal. But even then the
>number of pages is a questionable metric because even a single page
>might migration block in a non-killable state (e.g. __unmap_and_move).
>
>Remove the limit and offline the full requested range (this is one
>membblock worth of pages with the current code). Should we ever get a
s/membblock/memblock/
Or memory block is more accurate? May memblock confuse audience with
lower level facility?
>report that offlining takes too long to react on fatal signal then we
>should rather fix the core migration to use killable waits and bailout
>on a signal.
>
>Reviewed-by: David Hildenbrand <[email protected]>
>Reviewed-by: Pavel Tatashin <[email protected]>
>Reviewed-by: Oscar Salvador <[email protected]>
>Signed-off-by: Michal Hocko <[email protected]>
>---
> mm/memory_hotplug.c | 8 ++------
> 1 file changed, 2 insertions(+), 6 deletions(-)
>
>diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>index c82193db4be6..6263c8cd4491 100644
>--- a/mm/memory_hotplug.c
>+++ b/mm/memory_hotplug.c
>@@ -1339,18 +1339,16 @@ static struct page *new_node_page(struct page *page, unsigned long private)
> return new_page_nodemask(page, nid, &nmask);
> }
>
>-#define NR_OFFLINE_AT_ONCE_PAGES (256)
> static int
> do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> {
> unsigned long pfn;
> struct page *page;
>- int move_pages = NR_OFFLINE_AT_ONCE_PAGES;
> int not_managed = 0;
> int ret = 0;
> LIST_HEAD(source);
>
>- for (pfn = start_pfn; pfn < end_pfn && move_pages > 0; pfn++) {
>+ for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> if (!pfn_valid(pfn))
> continue;
> page = pfn_to_page(pfn);
>@@ -1362,8 +1360,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> ret = -EBUSY;
> break;
> }
>- if (isolate_huge_page(page, &source))
>- move_pages -= 1 << compound_order(head);
>+ isolate_huge_page(page, &source);
> continue;
> } else if (PageTransHuge(page))
> pfn = page_to_pfn(compound_head(page))
>@@ -1382,7 +1379,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> if (!ret) { /* Success */
> put_page(page);
> list_add_tail(&page->lru, &source);
>- move_pages--;
> if (!__PageMovable(page))
> inc_node_page_state(page, NR_ISOLATED_ANON +
> page_is_file_cache(page));
>--
>2.19.2
--
Wei Yang
Help you, Help me