Hi All,
Under heavy stress and constant memory hot add/remove, I have observed
the following loop to occasionally loop infinitely:
mm/memory_hotplug.c:__offline_pages
repeat:
/* start memory hot removal */
ret = -EINTR;
if (signal_pending(current))
goto failed_removal;
cond_resched();
lru_add_drain_all();
drain_all_pages(zone);
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
What appears to be happening in this case is that do_migrate_range
returns a failure code which is being ignored. The failure is stemming
from migrate_pages returning "1" which I'm guessing is the result of us
hitting the following case:
mm/migrate.c: migrate_pages
default:
/*
* Permanent failure (-EBUSY, -ENOSYS, etc.):
* unlike -EAGAIN case, the failed page is
* removed from migration page list and not
* retried in the next outer loop.
*/
nr_failed++;
break;
}
Does a failure in do_migrate_range indicate that the range is
unmigratable and the loop in __offline_pages should terminate and goto
failed_removal? Or should we allow a certain number of retrys before we
give up on migrating the range?
This issue was observed on a ppc64le lpar on a 4.18-rc6 kernel.
-John
On Wed 25-07-18 13:11:15, John Allen wrote:
[...]
> Does a failure in do_migrate_range indicate that the range is unmigratable
> and the loop in __offline_pages should terminate and goto failed_removal? Or
> should we allow a certain number of retrys before we
> give up on migrating the range?
Unfortunatelly not. Migration code doesn't tell a difference between
ephemeral and permanent failures. We are relying on
start_isolate_page_range to tell us this. So the question is, what kind
of page is not migratable and for what reason.
Are you able to add some debugging to give us more information. The
current debugging code in the hotplug/migration sucks...
--
Michal Hocko
SUSE Labs
On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:
>On Wed 25-07-18 13:11:15, John Allen wrote:
>[...]
>> Does a failure in do_migrate_range indicate that the range is unmigratable
>> and the loop in __offline_pages should terminate and goto failed_removal? Or
>> should we allow a certain number of retrys before we
>> give up on migrating the range?
>
>Unfortunatelly not. Migration code doesn't tell a difference between
>ephemeral and permanent failures. We are relying on
>start_isolate_page_range to tell us this. So the question is, what kind
>of page is not migratable and for what reason.
>
>Are you able to add some debugging to give us more information. The
>current debugging code in the hotplug/migration sucks...
After reproducing the problem a couple times, it seems that it can occur
for different types of pages. Running page-types on the offending page
over two separate instances produced the following:
# tools/vm/page-types -a 307968-308224
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000400 1 0 __________B________________________________ buddy
total 1 0
And the following on a separate run:
# tools/vm/page-types -a 313088-313344
flags page-count MB symbolic-flags long-symbolic-flags
0x000000000000006c 1 0 __RU_lA____________________________________ referenced,uptodate,lru,active
total 1 0
The source of the failure in migrate_pages actually doesn't seem to be
that we're hitting the case of the permanent failure, but instead the
-EAGAIN case. I traced the EAGAIN return back to
migrate_page_move_mapping which I've seen return EAGAIN in two places:
mm/migrate.c:453
if (!mapping) {
/* Anonymous page without mapping */
if (page_count(page) != expected_count)
return -EAGAIN;
mm/migrate.c:476
if (page_count(page) != expected_count ||
radix_tree_deref_slot_protected(pslot,
&mapping->i_pages.xa_lock) != page) {
xa_unlock_irq(&mapping->i_pages);
return -EAGAIN;
}
So it seems in each case, the actual reference count for the page is not
what it is expected to be.
On Fri 27-07-18 12:32:59, John Allen wrote:
> On Wed, Jul 25, 2018 at 10:03:36PM +0200, Michal Hocko wrote:
> > On Wed 25-07-18 13:11:15, John Allen wrote:
> > [...]
> > > Does a failure in do_migrate_range indicate that the range is unmigratable
> > > and the loop in __offline_pages should terminate and goto failed_removal? Or
> > > should we allow a certain number of retrys before we
> > > give up on migrating the range?
> >
> > Unfortunatelly not. Migration code doesn't tell a difference between
> > ephemeral and permanent failures. We are relying on
> > start_isolate_page_range to tell us this. So the question is, what kind
> > of page is not migratable and for what reason.
> >
> > Are you able to add some debugging to give us more information. The
> > current debugging code in the hotplug/migration sucks...
>
> After reproducing the problem a couple times, it seems that it can occur for
> different types of pages. Running page-types on the offending page over two
> separate instances produced the following:
>
> # tools/vm/page-types -a 307968-308224
> flags page-count MB symbolic-flags long-symbolic-flags
> 0x0000000000000400 1 0 __________B________________________________ buddy
> total 1 0
Huh! How come a buddy page has non zero reference count.
>
> And the following on a separate run:
>
> # tools/vm/page-types -a 313088-313344
> flags page-count MB symbolic-flags long-symbolic-flags
> 0x000000000000006c 1 0 __RU_lA____________________________________ referenced,uptodate,lru,active
> total 1 0
Hmm, what is the expected page count in this case? Seeing 1 doesn't look
particularly wrong.
--
Michal Hocko
SUSE Labs
On 26/07/18 04:11, John Allen wrote:
> Hi All,
>
> Under heavy stress and constant memory hot add/remove, I have observed
> the following loop to occasionally loop infinitely:
>
> mm/memory_hotplug.c:__offline_pages
>
> repeat:
> /* start memory hot removal */
> ret = -EINTR;
> if (signal_pending(current))
> goto failed_removal;
>
> cond_resched();
> lru_add_drain_all();
> drain_all_pages(zone);
>
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
>
What is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE set to for you?
I have also observed this when hot removing and adding memory. However I
only have only seen this when my kernel has
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n (when it is set to online
automatically I do not have this issue) so I assumed that I wasn't
onlining the memory properly...
> What appears to be happening in this case is that do_migrate_range
> returns a failure code which is being ignored. The failure is stemming
> from migrate_pages returning "1" which I'm guessing is the result of
> us hitting the following case:
>
> mm/migrate.c: migrate_pages
>
> default:
> /*
> * Permanent failure (-EBUSY, -ENOSYS, etc.):
> * unlike -EAGAIN case, the failed page is
> * removed from migration page list and not
> * retried in the next outer loop.
> */
> nr_failed++;
> break;
> }
>
> Does a failure in do_migrate_range indicate that the range is
> unmigratable and the loop in __offline_pages should terminate and goto
> failed_removal? Or should we allow a certain number of retrys before we
> give up on migrating the range?
>
> This issue was observed on a ppc64le lpar on a 4.18-rc6 kernel.
>
> -John
>
Michal Hocko <[email protected]> writes:
> On Wed 25-07-18 13:11:15, John Allen wrote:
> [...]
>> Does a failure in do_migrate_range indicate that the range is unmigratable
>> and the loop in __offline_pages should terminate and goto failed_removal? Or
>> should we allow a certain number of retrys before we
>> give up on migrating the range?
>
> Unfortunatelly not. Migration code doesn't tell a difference between
> ephemeral and permanent failures.
What's to stop an ephemeral failure happening repeatedly?
cheers
On Wed 01-08-18 21:09:39, Michael Ellerman wrote:
> Michal Hocko <[email protected]> writes:
> > On Wed 25-07-18 13:11:15, John Allen wrote:
> > [...]
> >> Does a failure in do_migrate_range indicate that the range is unmigratable
> >> and the loop in __offline_pages should terminate and goto failed_removal? Or
> >> should we allow a certain number of retrys before we
> >> give up on migrating the range?
> >
> > Unfortunatelly not. Migration code doesn't tell a difference between
> > ephemeral and permanent failures.
>
> What's to stop an ephemeral failure happening repeatedly?
If there is a short term pin on the page that prevents the migration
then the holder of the pin should realease it and the next retry will
succeed the migration. If the page gets freed on the way then it will
not be reallocated because they are isolated already. I can only see
complete OOM to be the reason to fail allocation of the target place
as the migration failure and that is highly unlikely and sooner or later
trigger the oom killer and release some memory.
The biggest problem here is that we cannot tell ephemeral and long term
pins...
--
Michal Hocko
SUSE Labs
Hi Michal,
Michal Hocko <[email protected]> writes:
> On Wed 25-07-18 13:11:15, John Allen wrote:
> [...]
>> Does a failure in do_migrate_range indicate that the range is unmigratable
>> and the loop in __offline_pages should terminate and goto failed_removal? Or
>> should we allow a certain number of retrys before we
>> give up on migrating the range?
>
> Unfortunatelly not. Migration code doesn't tell a difference between
> ephemeral and permanent failures. We are relying on
> start_isolate_page_range to tell us this. So the question is, what kind
> of page is not migratable and for what reason.
>
> Are you able to add some debugging to give us more information. The
> current debugging code in the hotplug/migration sucks...
Haren did most of the debugging, so at minimum we need a patch like this
I guess.
modified mm/page_alloc.c
@@ -7649,6 +7649,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
* handle each tail page individually in migration.
*/
if (PageHuge(page)) {
+
+ if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
+ goto unmovable;
+
iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
continue;
}
The problem is start_isolate_range, doesn't look at hugetlbpage and
return error if the architecture didn't support HUGEPAGE migration.
Now discussing with Naoya, I was suggsting whether we should add a
similar check in
modified mm/memory_hotplug.c
@@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
return pfn;
if (__PageMovable(page))
return pfn;
- if (PageHuge(page)) {
+ if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
+ PageHuge(page)) {
if (page_huge_active(page))
return pfn;
One of the thinking there is it possible to get new hugetlb pages
allocated in that range after start_isolate_range ran. But i guess since
we marked all the free pages as MIGRATE_ISOLATE that is not possible?
But then it is good to have scan_movable_pages also check for
HUGEPAGE_MIGRATION?
Complete patch below.
commit 2e9d754ac211f2af3731f15df3cd8cd070b4cc54
Author: Aneesh Kumar K.V <[email protected]>
Date: Tue Aug 21 14:17:55 2018 +0530
mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
When scanning for movable pages, filter out Hugetlb pages if hugepage migration
is not supported. Without this we hit infinte loop in __offline pages where we
do
pfn = scan_movable_pages(start_pfn, end_pfn);
if (pfn) { /* We have movable pages */
ret = do_migrate_range(pfn, end_pfn);
goto repeat;
}
We do support hugetlb migration ony if the hugetlb pages are at pmd level. Here
we just check for Kernel config. The gigantic page size check is done in
page_huge_active.
Reported-by: Haren Myneni <[email protected]>
CC: Naoya Horiguchi <[email protected]>
Signed-off-by: Aneesh Kumar K.V <[email protected]>
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4eb6e824a80c..f9bdea685cf4 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
return pfn;
if (__PageMovable(page))
return pfn;
- if (PageHuge(page)) {
+ if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
+ PageHuge(page)) {
if (page_huge_active(page))
return pfn;
else
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 15ea511fb41c..a3f81e18c882 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7649,6 +7649,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
* handle each tail page individually in migration.
*/
if (PageHuge(page)) {
+
+ if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
+ goto unmovable;
+
iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
continue;
}
On Wed 22-08-18 15:00:18, Aneesh Kumar K.V wrote:
>
> Hi Michal,
>
> Michal Hocko <[email protected]> writes:
>
> > On Wed 25-07-18 13:11:15, John Allen wrote:
> > [...]
> >> Does a failure in do_migrate_range indicate that the range is unmigratable
> >> and the loop in __offline_pages should terminate and goto failed_removal? Or
> >> should we allow a certain number of retrys before we
> >> give up on migrating the range?
> >
> > Unfortunatelly not. Migration code doesn't tell a difference between
> > ephemeral and permanent failures. We are relying on
> > start_isolate_page_range to tell us this. So the question is, what kind
> > of page is not migratable and for what reason.
> >
> > Are you able to add some debugging to give us more information. The
> > current debugging code in the hotplug/migration sucks...
>
> Haren did most of the debugging, so at minimum we need a patch like this
> I guess.
>
> modified mm/page_alloc.c
> @@ -7649,6 +7649,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * handle each tail page individually in migration.
> */
> if (PageHuge(page)) {
> +
> + if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> + goto unmovable;
> +
> iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
> continue;
> }
>
>
> The problem is start_isolate_range, doesn't look at hugetlbpage and
> return error if the architecture didn't support HUGEPAGE migration.
Yes this makes sense. I didn't really have arches without huge page
migration in mind.
> Now discussing with Naoya, I was suggsting whether we should add a
> similar check in
>
> modified mm/memory_hotplug.c
> @@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> return pfn;
> if (__PageMovable(page))
> return pfn;
> - if (PageHuge(page)) {
> + if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
> + PageHuge(page)) {
> if (page_huge_active(page))
> return pfn;
>
> One of the thinking there is it possible to get new hugetlb pages
> allocated in that range after start_isolate_range ran. But i guess since
> we marked all the free pages as MIGRATE_ISOLATE that is not possible?
I do not follow. You are usually allocating new pages outside of the
offlined range. But the above change makes sense because it doesn't
really make sense to migrate pfn if it is backed by a non-migrateable
huge page. dissolve_free_huge_pages should then try to remove it
completely and the offlining fails if that is not possible.
> But then it is good to have scan_movable_pages also check for
> HUGEPAGE_MIGRATION?
Good question. It is not necessary right now because has_unmovable_pages
called earlier should take care of it. But I guess it will not hurt to
have it there as well for the clarity.
>
> Complete patch below.
>
> commit 2e9d754ac211f2af3731f15df3cd8cd070b4cc54
> Author: Aneesh Kumar K.V <[email protected]>
> Date: Tue Aug 21 14:17:55 2018 +0530
>
> mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
>
> When scanning for movable pages, filter out Hugetlb pages if hugepage migration
> is not supported. Without this we hit infinte loop in __offline pages where we
> do
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
>
> We do support hugetlb migration ony if the hugetlb pages are at pmd level. Here
> we just check for Kernel config. The gigantic page size check is done in
> page_huge_active.
That being said. The patch makes sense to me.
>
> Reported-by: Haren Myneni <[email protected]>
> CC: Naoya Horiguchi <[email protected]>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
I guess we should mark it for stable even though I am not sure how often
do we see CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION=n
Acked-by: Michal Hocko <[email protected]>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4eb6e824a80c..f9bdea685cf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> return pfn;
> if (__PageMovable(page))
> return pfn;
> - if (PageHuge(page)) {
> + if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
> + PageHuge(page)) {
> if (page_huge_active(page))
> return pfn;
> else
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 15ea511fb41c..a3f81e18c882 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7649,6 +7649,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * handle each tail page individually in migration.
> */
> if (PageHuge(page)) {
> +
> + if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> + goto unmovable;
> +
> iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
> continue;
> }
--
Michal Hocko
SUSE Labs
On 08/22/2018 02:30 AM, Aneesh Kumar K.V wrote:
> commit 2e9d754ac211f2af3731f15df3cd8cd070b4cc54
> Author: Aneesh Kumar K.V <[email protected]>
> Date: Tue Aug 21 14:17:55 2018 +0530
>
> mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
>
> When scanning for movable pages, filter out Hugetlb pages if hugepage migration
> is not supported. Without this we hit infinte loop in __offline pages where we
> do
> pfn = scan_movable_pages(start_pfn, end_pfn);
> if (pfn) { /* We have movable pages */
> ret = do_migrate_range(pfn, end_pfn);
> goto repeat;
> }
>
> We do support hugetlb migration ony if the hugetlb pages are at pmd level. Here
I thought migration at pgd level was added for POWER? commit 94310cbcaa3c
(mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level).
Only remember, because I did not fully understand the use case. :)
> we just check for Kernel config. The gigantic page size check is done in
> page_huge_active.
>
> Reported-by: Haren Myneni <[email protected]>
> CC: Naoya Horiguchi <[email protected]>
> Signed-off-by: Aneesh Kumar K.V <[email protected]>
>
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 4eb6e824a80c..f9bdea685cf4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> return pfn;
> if (__PageMovable(page))
> return pfn;
> - if (PageHuge(page)) {
> + if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
> + PageHuge(page)) {
How about using hugepage_migration_supported instead? It would automatically
catch those non-migratable huge page sizes. Something like:
if (PageHuge(page) &&
hugepage_migration_supported(page_hstate(page))) {
--
Mike Kravetz
> if (page_huge_active(page))
> return pfn;
> else
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 15ea511fb41c..a3f81e18c882 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7649,6 +7649,10 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
> * handle each tail page individually in migration.
> */
> if (PageHuge(page)) {
> +
> + if (!IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION))
> + goto unmovable;
> +
> iter = round_up(iter + 1, 1<<compound_order(page)) - 1;
> continue;
> }
>
On 08/23/2018 12:28 AM, Mike Kravetz wrote:
> On 08/22/2018 02:30 AM, Aneesh Kumar K.V wrote:
>> commit 2e9d754ac211f2af3731f15df3cd8cd070b4cc54
>> Author: Aneesh Kumar K.V <[email protected]>
>> Date: Tue Aug 21 14:17:55 2018 +0530
>>
>> mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
>>
>> When scanning for movable pages, filter out Hugetlb pages if hugepage migration
>> is not supported. Without this we hit infinte loop in __offline pages where we
>> do
>> pfn = scan_movable_pages(start_pfn, end_pfn);
>> if (pfn) { /* We have movable pages */
>> ret = do_migrate_range(pfn, end_pfn);
>> goto repeat;
>> }
>>
>> We do support hugetlb migration ony if the hugetlb pages are at pmd level. Here
>
> I thought migration at pgd level was added for POWER? commit 94310cbcaa3c
> (mm/madvise: enable (soft|hard) offline of HugeTLB pages at PGD level).
> Only remember, because I did not fully understand the use case. :)
>
yes. We hit the issue on older distro kernels.
>> we just check for Kernel config. The gigantic page size check is done in
>> page_huge_active.
>>
>> Reported-by: Haren Myneni <[email protected]>
>> CC: Naoya Horiguchi <[email protected]>
>> Signed-off-by: Aneesh Kumar K.V <[email protected]>
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 4eb6e824a80c..f9bdea685cf4 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
>> return pfn;
>> if (__PageMovable(page))
>> return pfn;
>> - if (PageHuge(page)) {
>> + if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
>> + PageHuge(page)) {
>
> How about using hugepage_migration_supported instead? It would automatically
> catch those non-migratable huge page sizes. Something like:
>
Will do that.
> if (PageHuge(page) &&
> hugepage_migration_supported(page_hstate(page))) {
>
-aneesh
On Wed 22-08-18 11:58:02, Mike Kravetz wrote:
> On 08/22/2018 02:30 AM, Aneesh Kumar K.V wrote:
[...]
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 4eb6e824a80c..f9bdea685cf4 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1338,7 +1338,8 @@ static unsigned long scan_movable_pages(unsigned long start, unsigned long end)
> > return pfn;
> > if (__PageMovable(page))
> > return pfn;
> > - if (PageHuge(page)) {
> > + if (IS_ENABLED(CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION) &&
> > + PageHuge(page)) {
>
> How about using hugepage_migration_supported instead? It would automatically
> catch those non-migratable huge page sizes. Something like:
>
> if (PageHuge(page) &&
> hugepage_migration_supported(page_hstate(page))) {
Ohh, definitely, this is much better.
--
Michal Hocko
SUSE Labs