2017-07-26 10:50:09

by Michal Hocko

[permalink] [raw]
Subject: gigantic hugepages vs. movable zones

Hi,
I've just noticed that alloc_gigantic_page ignores movability of the
gigantic page and it uses any existing zone. Considering that
hugepage_migration_supported only supports 2MB and pgd level hugepages
then 1GB pages are not migratable and as such allocating them from a
movable zone will break the basic expectation of this zone. Standard
hugetlb allocations try to avoid that by using htlb_alloc_mask and I
believe we should do the same for gigantic pages as well.

I suspect this behavior is not intentional. What do you think about the
following untested patch?
---
>From 542d32c1eca7dcf38afca1a91bca4a472f6e8651 Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Wed, 26 Jul 2017 12:43:43 +0200
Subject: [PATCH] mm, hugetlb: do not allocate non-migrateable gigantic pages
from movable zones

alloc_gigantic_page doesn't consider movability of the gigantic hugetlb
when scanning eligible ranges for the allocation. As 1GB hugetlb pages
are not movable currently this can break the movable zone assumption
that all allocations are migrateable and as such break memory hotplug.

Reorganize the code and use the standard zonelist allocations scheme
that we use for standard hugetbl pages. htlb_alloc_mask will ensure that
only migratable hugetlb pages will ever see a movable zone.

Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
Signed-off-by: Michal Hocko <[email protected]>
---
mm/hugetlb.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bc48ee783dd9..60530bb3d228 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1066,11 +1066,11 @@ static void free_gigantic_page(struct page *page, unsigned int order)
}

static int __alloc_gigantic_page(unsigned long start_pfn,
- unsigned long nr_pages)
+ unsigned long nr_pages, gfp_t gfp_mask)
{
unsigned long end_pfn = start_pfn + nr_pages;
return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
- GFP_KERNEL);
+ gfp_mask);
}

static bool pfn_range_valid_gigantic(struct zone *z,
@@ -1108,19 +1108,24 @@ static bool zone_spans_last_pfn(const struct zone *zone,
return zone_spans_pfn(zone, last_pfn);
}

-static struct page *alloc_gigantic_page(int nid, unsigned int order)
+static struct page *alloc_gigantic_page(int nid, struct hstate *h)
{
+ unsigned int order = huge_page_order(h);
unsigned long nr_pages = 1 << order;
unsigned long ret, pfn, flags;
- struct zone *z;
+ struct zonelist *zonelist;
+ struct zone *zone;
+ struct zoneref *z;
+ gfp_t gfp_mask;

- z = NODE_DATA(nid)->node_zones;
- for (; z - NODE_DATA(nid)->node_zones < MAX_NR_ZONES; z++) {
- spin_lock_irqsave(&z->lock, flags);
+ gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
+ zonelist = node_zonelist(nid, gfp_mask);
+ for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), NULL) {
+ spin_lock_irqsave(&zone->lock, flags);

- pfn = ALIGN(z->zone_start_pfn, nr_pages);
- while (zone_spans_last_pfn(z, pfn, nr_pages)) {
- if (pfn_range_valid_gigantic(z, pfn, nr_pages)) {
+ pfn = ALIGN(zone->zone_start_pfn, nr_pages);
+ while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
+ if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) {
/*
* We release the zone lock here because
* alloc_contig_range() will also lock the zone
@@ -1128,16 +1133,16 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
* spinning on this lock, it may win the race
* and cause alloc_contig_range() to fail...
*/
- spin_unlock_irqrestore(&z->lock, flags);
- ret = __alloc_gigantic_page(pfn, nr_pages);
+ spin_unlock_irqrestore(&zone->lock, flags);
+ ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask);
if (!ret)
return pfn_to_page(pfn);
- spin_lock_irqsave(&z->lock, flags);
+ spin_lock_irqsave(&zone->lock, flags);
}
pfn += nr_pages;
}

- spin_unlock_irqrestore(&z->lock, flags);
+ spin_unlock_irqrestore(&zone->lock, flags);
}

return NULL;
@@ -1150,7 +1155,7 @@ static struct page *alloc_fresh_gigantic_page_node(struct hstate *h, int nid)
{
struct page *page;

- page = alloc_gigantic_page(nid, huge_page_order(h));
+ page = alloc_gigantic_page(nid, h);
if (page) {
prep_compound_gigantic_page(page, huge_page_order(h));
prep_new_huge_page(h, page, nid);
--
2.13.2

--
Michal Hocko
SUSE Labs


2017-07-27 02:22:40

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

Michal Hocko <[email protected]> writes:

> Hi,
> I've just noticed that alloc_gigantic_page ignores movability of the
> gigantic page and it uses any existing zone. Considering that
> hugepage_migration_supported only supports 2MB and pgd level hugepages
> then 1GB pages are not migratable and as such allocating them from a
> movable zone will break the basic expectation of this zone. Standard
> hugetlb allocations try to avoid that by using htlb_alloc_mask and I
> believe we should do the same for gigantic pages as well.
>
> I suspect this behavior is not intentional. What do you think about the
> following untested patch?


I also noticed an unrelated issue with the usage of
start_isolate_page_range. On error we set the migrate type to
MIGRATE_MOVABLE. That may conflict with CMA pages ? Wondering whether
we should check for page's pageblock migrate type in
pfn_range_valid_gigantic() ?

-aneesh

2017-07-27 07:29:04

by Michal Hocko

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

On Thu 27-07-17 07:52:08, Aneesh Kumar K.V wrote:
> Michal Hocko <[email protected]> writes:
>
> > Hi,
> > I've just noticed that alloc_gigantic_page ignores movability of the
> > gigantic page and it uses any existing zone. Considering that
> > hugepage_migration_supported only supports 2MB and pgd level hugepages
> > then 1GB pages are not migratable and as such allocating them from a
> > movable zone will break the basic expectation of this zone. Standard
> > hugetlb allocations try to avoid that by using htlb_alloc_mask and I
> > believe we should do the same for gigantic pages as well.
> >
> > I suspect this behavior is not intentional. What do you think about the
> > following untested patch?
>
>
> I also noticed an unrelated issue with the usage of
> start_isolate_page_range. On error we set the migrate type to
> MIGRATE_MOVABLE.

Why that should be a problem? I think it is perfectly OK to have
MIGRATE_MOVABLE pageblocks inside kernel zones.

> That may conflict with CMA pages ?

How?

> Wondering whether we should check for page's pageblock migrate type in
> pfn_range_valid_gigantic() ?

I do not think so. Migrate type is just too lowlevel for
pfn_range_valid_gigantic. If something like that is really needed then
it should go down the CMA/alloc_contig_range path.
--
Michal Hocko
SUSE Labs

2017-07-27 08:00:39

by Aneesh Kumar K.V

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones



On 07/27/2017 12:58 PM, Michal Hocko wrote:
> On Thu 27-07-17 07:52:08, Aneesh Kumar K.V wrote:
>> Michal Hocko <[email protected]> writes:
>>
>>> Hi,
>>> I've just noticed that alloc_gigantic_page ignores movability of the
>>> gigantic page and it uses any existing zone. Considering that
>>> hugepage_migration_supported only supports 2MB and pgd level hugepages
>>> then 1GB pages are not migratable and as such allocating them from a
>>> movable zone will break the basic expectation of this zone. Standard
>>> hugetlb allocations try to avoid that by using htlb_alloc_mask and I
>>> believe we should do the same for gigantic pages as well.
>>>
>>> I suspect this behavior is not intentional. What do you think about the
>>> following untested patch?
>>
>>
>> I also noticed an unrelated issue with the usage of
>> start_isolate_page_range. On error we set the migrate type to
>> MIGRATE_MOVABLE.
>
> Why that should be a problem? I think it is perfectly OK to have
> MIGRATE_MOVABLE pageblocks inside kernel zones.
>

we can pick pages with migrate type movable and if we fail to isolate
won't we set the migrate type of that pages to MOVABLE ?

-aneesh

2017-07-27 08:12:42

by Michal Hocko

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

On Thu 27-07-17 13:30:31, Aneesh Kumar K.V wrote:
>
>
> On 07/27/2017 12:58 PM, Michal Hocko wrote:
> >On Thu 27-07-17 07:52:08, Aneesh Kumar K.V wrote:
> >>Michal Hocko <[email protected]> writes:
> >>
> >>>Hi,
> >>>I've just noticed that alloc_gigantic_page ignores movability of the
> >>>gigantic page and it uses any existing zone. Considering that
> >>>hugepage_migration_supported only supports 2MB and pgd level hugepages
> >>>then 1GB pages are not migratable and as such allocating them from a
> >>>movable zone will break the basic expectation of this zone. Standard
> >>>hugetlb allocations try to avoid that by using htlb_alloc_mask and I
> >>>believe we should do the same for gigantic pages as well.
> >>>
> >>>I suspect this behavior is not intentional. What do you think about the
> >>>following untested patch?
> >>
> >>
> >>I also noticed an unrelated issue with the usage of
> >>start_isolate_page_range. On error we set the migrate type to
> >>MIGRATE_MOVABLE.
> >
> >Why that should be a problem? I think it is perfectly OK to have
> >MIGRATE_MOVABLE pageblocks inside kernel zones.
> >
>
> we can pick pages with migrate type movable and if we fail to isolate won't
> we set the migrate type of that pages to MOVABLE ?

I do not see an immediate problem. GFP_KERNEL allocations can fallback
to movable migrate pageblocks AFAIR. But I am not very much familiar
with migratetypes. Vlastimil, could you have a look please?
--
Michal Hocko
SUSE Labs

2017-07-27 08:23:03

by Michal Hocko

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

[CC for real]

On Thu 27-07-17 10:12:36, Michal Hocko wrote:
> On Thu 27-07-17 13:30:31, Aneesh Kumar K.V wrote:
> >
> >
> > On 07/27/2017 12:58 PM, Michal Hocko wrote:
> > >On Thu 27-07-17 07:52:08, Aneesh Kumar K.V wrote:
> > >>Michal Hocko <[email protected]> writes:
> > >>
> > >>>Hi,
> > >>>I've just noticed that alloc_gigantic_page ignores movability of the
> > >>>gigantic page and it uses any existing zone. Considering that
> > >>>hugepage_migration_supported only supports 2MB and pgd level hugepages
> > >>>then 1GB pages are not migratable and as such allocating them from a
> > >>>movable zone will break the basic expectation of this zone. Standard
> > >>>hugetlb allocations try to avoid that by using htlb_alloc_mask and I
> > >>>believe we should do the same for gigantic pages as well.
> > >>>
> > >>>I suspect this behavior is not intentional. What do you think about the
> > >>>following untested patch?
> > >>
> > >>
> > >>I also noticed an unrelated issue with the usage of
> > >>start_isolate_page_range. On error we set the migrate type to
> > >>MIGRATE_MOVABLE.
> > >
> > >Why that should be a problem? I think it is perfectly OK to have
> > >MIGRATE_MOVABLE pageblocks inside kernel zones.
> > >
> >
> > we can pick pages with migrate type movable and if we fail to isolate won't
> > we set the migrate type of that pages to MOVABLE ?
>
> I do not see an immediate problem. GFP_KERNEL allocations can fallback
> to movable migrate pageblocks AFAIR. But I am not very much familiar
> with migratetypes. Vlastimil, could you have a look please?

--
Michal Hocko
SUSE Labs

2017-07-27 11:57:50

by Vlastimil Babka

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

On 07/27/2017 10:22 AM, Michal Hocko wrote:
> [CC for real]
>
> On Thu 27-07-17 10:12:36, Michal Hocko wrote:
>> On Thu 27-07-17 13:30:31, Aneesh Kumar K.V wrote:
>>>
>>>
>>> On 07/27/2017 12:58 PM, Michal Hocko wrote:
>>>> On Thu 27-07-17 07:52:08, Aneesh Kumar K.V wrote:
>>>>> Michal Hocko <[email protected]> writes:
>>>>>
>>>>>> Hi,
>>>>>> I've just noticed that alloc_gigantic_page ignores movability of the
>>>>>> gigantic page and it uses any existing zone. Considering that
>>>>>> hugepage_migration_supported only supports 2MB and pgd level hugepages
>>>>>> then 1GB pages are not migratable and as such allocating them from a
>>>>>> movable zone will break the basic expectation of this zone. Standard
>>>>>> hugetlb allocations try to avoid that by using htlb_alloc_mask and I
>>>>>> believe we should do the same for gigantic pages as well.
>>>>>>
>>>>>> I suspect this behavior is not intentional. What do you think about the
>>>>>> following untested patch?
>>>>>
>>>>>
>>>>> I also noticed an unrelated issue with the usage of
>>>>> start_isolate_page_range. On error we set the migrate type to
>>>>> MIGRATE_MOVABLE.
>>>>
>>>> Why that should be a problem? I think it is perfectly OK to have
>>>> MIGRATE_MOVABLE pageblocks inside kernel zones.
>>>>
>>>
>>> we can pick pages with migrate type movable and if we fail to isolate won't

^ CMA

>>> we set the migrate type of that pages to MOVABLE ?

Yes, it seems we can silently kill CMA pageblocks in such case. Joonsoo,
can you check?

>>
>> I do not see an immediate problem. GFP_KERNEL allocations can fallback
>> to movable migrate pageblocks AFAIR. But I am not very much familiar
>> with migratetypes. Vlastimil, could you have a look please?
>

2017-07-28 20:48:40

by Mike Kravetz

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

On 07/26/2017 03:50 AM, Michal Hocko wrote:
> Hi,
> I've just noticed that alloc_gigantic_page ignores movability of the
> gigantic page and it uses any existing zone. Considering that
> hugepage_migration_supported only supports 2MB and pgd level hugepages
> then 1GB pages are not migratable and as such allocating them from a
> movable zone will break the basic expectation of this zone. Standard
> hugetlb allocations try to avoid that by using htlb_alloc_mask and I
> believe we should do the same for gigantic pages as well.
>
> I suspect this behavior is not intentional. What do you think about the
> following untested patch?
> ---
> From 542d32c1eca7dcf38afca1a91bca4a472f6e8651 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Wed, 26 Jul 2017 12:43:43 +0200
> Subject: [PATCH] mm, hugetlb: do not allocate non-migrateable gigantic pages
> from movable zones
>
> alloc_gigantic_page doesn't consider movability of the gigantic hugetlb
> when scanning eligible ranges for the allocation. As 1GB hugetlb pages
> are not movable currently this can break the movable zone assumption
> that all allocations are migrateable and as such break memory hotplug.
>
> Reorganize the code and use the standard zonelist allocations scheme
> that we use for standard hugetbl pages. htlb_alloc_mask will ensure that
> only migratable hugetlb pages will ever see a movable zone.
>
> Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
> Signed-off-by: Michal Hocko <[email protected]>

This seems reasonable to me, and I like the fact that the code is more
like the default huge page case. I don't see any issues with the code.
I did some simple smoke testing of allocating 1G pages with the new code
and ensuring they ended up as expected.

Reviewed-by: Mike Kravetz <[email protected]>

--
Mike Kravetz

> ---
> mm/hugetlb.c | 35 ++++++++++++++++++++---------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bc48ee783dd9..60530bb3d228 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1066,11 +1066,11 @@ static void free_gigantic_page(struct page *page, unsigned int order)
> }
>
> static int __alloc_gigantic_page(unsigned long start_pfn,
> - unsigned long nr_pages)
> + unsigned long nr_pages, gfp_t gfp_mask)
> {
> unsigned long end_pfn = start_pfn + nr_pages;
> return alloc_contig_range(start_pfn, end_pfn, MIGRATE_MOVABLE,
> - GFP_KERNEL);
> + gfp_mask);
> }
>
> static bool pfn_range_valid_gigantic(struct zone *z,
> @@ -1108,19 +1108,24 @@ static bool zone_spans_last_pfn(const struct zone *zone,
> return zone_spans_pfn(zone, last_pfn);
> }
>
> -static struct page *alloc_gigantic_page(int nid, unsigned int order)
> +static struct page *alloc_gigantic_page(int nid, struct hstate *h)
> {
> + unsigned int order = huge_page_order(h);
> unsigned long nr_pages = 1 << order;
> unsigned long ret, pfn, flags;
> - struct zone *z;
> + struct zonelist *zonelist;
> + struct zone *zone;
> + struct zoneref *z;
> + gfp_t gfp_mask;
>
> - z = NODE_DATA(nid)->node_zones;
> - for (; z - NODE_DATA(nid)->node_zones < MAX_NR_ZONES; z++) {
> - spin_lock_irqsave(&z->lock, flags);
> + gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
> + zonelist = node_zonelist(nid, gfp_mask);
> + for_each_zone_zonelist_nodemask(zone, z, zonelist, gfp_zone(gfp_mask), NULL) {
> + spin_lock_irqsave(&zone->lock, flags);
>
> - pfn = ALIGN(z->zone_start_pfn, nr_pages);
> - while (zone_spans_last_pfn(z, pfn, nr_pages)) {
> - if (pfn_range_valid_gigantic(z, pfn, nr_pages)) {
> + pfn = ALIGN(zone->zone_start_pfn, nr_pages);
> + while (zone_spans_last_pfn(zone, pfn, nr_pages)) {
> + if (pfn_range_valid_gigantic(zone, pfn, nr_pages)) {
> /*
> * We release the zone lock here because
> * alloc_contig_range() will also lock the zone
> @@ -1128,16 +1133,16 @@ static struct page *alloc_gigantic_page(int nid, unsigned int order)
> * spinning on this lock, it may win the race
> * and cause alloc_contig_range() to fail...
> */
> - spin_unlock_irqrestore(&z->lock, flags);
> - ret = __alloc_gigantic_page(pfn, nr_pages);
> + spin_unlock_irqrestore(&zone->lock, flags);
> + ret = __alloc_gigantic_page(pfn, nr_pages, gfp_mask);
> if (!ret)
> return pfn_to_page(pfn);
> - spin_lock_irqsave(&z->lock, flags);
> + spin_lock_irqsave(&zone->lock, flags);
> }
> pfn += nr_pages;
> }
>
> - spin_unlock_irqrestore(&z->lock, flags);
> + spin_unlock_irqrestore(&zone->lock, flags);
> }
>
> return NULL;
> @@ -1150,7 +1155,7 @@ static struct page *alloc_fresh_gigantic_page_node(struct hstate *h, int nid)
> {
> struct page *page;
>
> - page = alloc_gigantic_page(nid, huge_page_order(h));
> + page = alloc_gigantic_page(nid, h);
> if (page) {
> prep_compound_gigantic_page(page, huge_page_order(h));
> prep_new_huge_page(h, page, nid);
>

2017-07-31 06:47:46

by Michal Hocko

[permalink] [raw]
Subject: Re: gigantic hugepages vs. movable zones

On Fri 28-07-17 13:48:28, Mike Kravetz wrote:
> On 07/26/2017 03:50 AM, Michal Hocko wrote:
> > Hi,
> > I've just noticed that alloc_gigantic_page ignores movability of the
> > gigantic page and it uses any existing zone. Considering that
> > hugepage_migration_supported only supports 2MB and pgd level hugepages
> > then 1GB pages are not migratable and as such allocating them from a
> > movable zone will break the basic expectation of this zone. Standard
> > hugetlb allocations try to avoid that by using htlb_alloc_mask and I
> > believe we should do the same for gigantic pages as well.
> >
> > I suspect this behavior is not intentional. What do you think about the
> > following untested patch?
> > ---
> > From 542d32c1eca7dcf38afca1a91bca4a472f6e8651 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <[email protected]>
> > Date: Wed, 26 Jul 2017 12:43:43 +0200
> > Subject: [PATCH] mm, hugetlb: do not allocate non-migrateable gigantic pages
> > from movable zones
> >
> > alloc_gigantic_page doesn't consider movability of the gigantic hugetlb
> > when scanning eligible ranges for the allocation. As 1GB hugetlb pages
> > are not movable currently this can break the movable zone assumption
> > that all allocations are migrateable and as such break memory hotplug.
> >
> > Reorganize the code and use the standard zonelist allocations scheme
> > that we use for standard hugetbl pages. htlb_alloc_mask will ensure that
> > only migratable hugetlb pages will ever see a movable zone.
> >
> > Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
> > Signed-off-by: Michal Hocko <[email protected]>
>
> This seems reasonable to me, and I like the fact that the code is more
> like the default huge page case. I don't see any issues with the code.
> I did some simple smoke testing of allocating 1G pages with the new code
> and ensuring they ended up as expected.
>
> Reviewed-by: Mike Kravetz <[email protected]>

Thanks a lot Mike! I will play with this some more today and tomorrow
and send the final patch later this week.
--
Michal Hocko
SUSE Labs