2021-02-08 10:52:06

by Oscar Salvador

[permalink] [raw]
Subject: [RFC PATCH 0/2] Make alloc_contig_range handle Hugetlb pages

Hi,

I promised to Mike to have a look into this a few weeks ago.
This is my first attempt.

I carried out some tests with a module that tries to allocate with
alloc_contig_range() from a range where we have free and in-use hugetlb
pages.
So far I did not spot any problem and it worked.

Please, note that it is not fully completed, as some things remain to be sorted
out (see list below), but I wanted to publish it to see whether the way I am
going makes sense to people, or I am doing something worrisome.

E.g:
- What happens when we allocated a new hugetlb page, but we cannot dissolve
the old one? (should not really happen (tm))
- When allocating the new hugetlb page I try to do it in the same node
the old one is. Should we be more flexible and allow to fallback to
other nodes?
Userspace can request hugetlb on specific nodes [1], but it can also
request them through generic interfaces [2].
- Problems I did not foresee yet

[1] /sys/devices/system/node/nodeX/hugepages/*
[2] /proc/sys/vm/nr_hugepages or /sys/kernel/mm/hugepages/*

Oscar Salvador (2):
mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages
mm,page_alloc: Make alloc_contig_range handle free hugetlb pages

include/linux/hugetlb.h | 6 ++++++
mm/compaction.c | 28 ++++++++++++++++++++++++++++
mm/hugetlb.c | 35 +++++++++++++++++++++++++++++++++++
mm/vmscan.c | 5 +++--
4 files changed, 72 insertions(+), 2 deletions(-)

--
2.16.3


2021-02-08 10:52:06

by Oscar Salvador

[permalink] [raw]
Subject: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

alloc_contig_range is not prepared to handle hugetlb pages and will
fail if it ever sees one, but since they can be migrated as any other
page (LRU and Movable), it makes sense to also handle them.

For now, do it only when coming from alloc_contig_range.

Signed-off-by: Oscar Salvador <[email protected]>
---
mm/compaction.c | 17 +++++++++++++++++
mm/vmscan.c | 5 +++--
2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index e5acb9714436..89cd2e60da29 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
goto isolate_fail;
}

+ /*
+ * Handle hugetlb pages only when coming from alloc_contig
+ */
+ if (PageHuge(page) && cc->alloc_contig) {
+ if (page_count(page)) {
+ /*
+ * Hugetlb page in-use. Isolate and migrate.
+ */
+ if (isolate_huge_page(page, &cc->migratepages)) {
+ low_pfn += compound_nr(page) - 1;
+ goto isolate_success_no_list;
+ }
+ }
+ goto isolate_fail;
+ }
+
/*
* Check may be lockless but that's ok as we recheck later.
* It's possible to migrate LRU and non-lru movable pages.
@@ -1041,6 +1057,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,

isolate_success:
list_add(&page->lru, &cc->migratepages);
+isolate_success_no_list:
cc->nr_migratepages += compound_nr(page);
nr_isolated += compound_nr(page);

diff --git a/mm/vmscan.c b/mm/vmscan.c
index b1b574ad199d..0803adca4469 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1506,8 +1506,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
LIST_HEAD(clean_pages);

list_for_each_entry_safe(page, next, page_list, lru) {
- if (page_is_file_lru(page) && !PageDirty(page) &&
- !__PageMovable(page) && !PageUnevictable(page)) {
+ if (!PageHuge(page) && page_is_file_lru(page) &&
+ !PageDirty(page) && !__PageMovable(page) &&
+ !PageUnevictable(page)) {
ClearPageActive(page);
list_move(&page->lru, &clean_pages);
}
--
2.16.3

2021-02-08 10:55:26

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 0/2] Make alloc_contig_range handle Hugetlb pages

On Mon, Feb 08, 2021 at 11:38:10AM +0100, Oscar Salvador wrote:
> Hi,

Heh, I've had a case of the mondays and I sent a first spin
without the ML, sorry for the spam.


--
Oscar Salvador
SUSE L3

2021-02-10 09:15:28

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On 08.02.21 11:38, Oscar Salvador wrote:
> alloc_contig_range is not prepared to handle hugetlb pages and will
> fail if it ever sees one, but since they can be migrated as any other
> page (LRU and Movable), it makes sense to also handle them.
>
> For now, do it only when coming from alloc_contig_range.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> mm/compaction.c | 17 +++++++++++++++++
> mm/vmscan.c | 5 +++--
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e5acb9714436..89cd2e60da29 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> goto isolate_fail;
> }
>
> + /*
> + * Handle hugetlb pages only when coming from alloc_contig
> + */
> + if (PageHuge(page) && cc->alloc_contig) {
> + if (page_count(page)) {

I wonder if we should care about races here. What if someone
concurrently allocates/frees?

Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
assume we'll have to handle that as well.

I wonder if it would make sense to move some of the magic to hugetlb
code and handle it there with less chances for races (isolate if used,
alloc-and-dissolve if not).

> + /*
> + * Hugetlb page in-use. Isolate and migrate.
> + */
> + if (isolate_huge_page(page, &cc->migratepages)) {
> + low_pfn += compound_nr(page) - 1;
> + goto isolate_success_no_list;
> + }
> + }

--
Thanks,

David / dhildenb

2021-02-10 14:14:15

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
> On 08.02.21 11:38, Oscar Salvador wrote:
> > alloc_contig_range is not prepared to handle hugetlb pages and will
> > fail if it ever sees one, but since they can be migrated as any other
> > page (LRU and Movable), it makes sense to also handle them.
> >
> > For now, do it only when coming from alloc_contig_range.
> >
> > Signed-off-by: Oscar Salvador <[email protected]>
> > ---
> > mm/compaction.c | 17 +++++++++++++++++
> > mm/vmscan.c | 5 +++--
> > 2 files changed, 20 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/compaction.c b/mm/compaction.c
> > index e5acb9714436..89cd2e60da29 100644
> > --- a/mm/compaction.c
> > +++ b/mm/compaction.c
> > @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > goto isolate_fail;
> > }
> > + /*
> > + * Handle hugetlb pages only when coming from alloc_contig
> > + */
> > + if (PageHuge(page) && cc->alloc_contig) {
> > + if (page_count(page)) {
>
> I wonder if we should care about races here. What if someone concurrently
> allocates/frees?
>
> Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
> assume we'll have to handle that as well.
>
> I wonder if it would make sense to move some of the magic to hugetlb code
> and handle it there with less chances for races (isolate if used,
> alloc-and-dissolve if not).

Yes, it makes sense to keep the magic in hugetlb code.
Note, though, that removing all races might be tricky.

isolate_huge_page() checks for PageHuge under hugetlb_lock,
so there is a race between a call to PageHuge(x) and a subsequent
call to isolate_huge_page().
But we should be fine as isolate_huge_page will fail in case the page is
no longer HugeTLB.

Also, since isolate_migratepages_block() gets called with ranges
pageblock aligned, we should never be handling tail pages in the core
of the function. E.g: the same way we handle THP:

/* The whole page is taken off the LRU; skip the tail pages. */
if (PageCompound(page))
low_pfn += compound_nr(page) - 1;

But all in all, the code has to be more bullet-proof. This RFC was more
like a PoC to see whether something crazy was done.
And as I said, moving the handling of hugetlb pages to hugetlb.c might
help towards a better error-race-handling.

Thanks for having a look ;-)

--
Oscar Salvador
SUSE L3

2021-02-10 14:18:00

by Oscar Salvador

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On Wed, Feb 10, 2021 at 03:11:05PM +0100, David Hildenbrand wrote:
> On 10.02.21 15:09, Oscar Salvador wrote:
> > On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
> > > On 08.02.21 11:38, Oscar Salvador wrote:
> > > > alloc_contig_range is not prepared to handle hugetlb pages and will
> > > > fail if it ever sees one, but since they can be migrated as any other
> > > > page (LRU and Movable), it makes sense to also handle them.
> > > >
> > > > For now, do it only when coming from alloc_contig_range.
> > > >
> > > > Signed-off-by: Oscar Salvador <[email protected]>
> > > > ---
> > > > mm/compaction.c | 17 +++++++++++++++++
> > > > mm/vmscan.c | 5 +++--
> > > > 2 files changed, 20 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/mm/compaction.c b/mm/compaction.c
> > > > index e5acb9714436..89cd2e60da29 100644
> > > > --- a/mm/compaction.c
> > > > +++ b/mm/compaction.c
> > > > @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> > > > goto isolate_fail;
> > > > }
> > > > + /*
> > > > + * Handle hugetlb pages only when coming from alloc_contig
> > > > + */
> > > > + if (PageHuge(page) && cc->alloc_contig) {
> > > > + if (page_count(page)) {
> > >
> > > I wonder if we should care about races here. What if someone concurrently
> > > allocates/frees?
> > >
> > > Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
> > > assume we'll have to handle that as well.
> > >
> > > I wonder if it would make sense to move some of the magic to hugetlb code
> > > and handle it there with less chances for races (isolate if used,
> > > alloc-and-dissolve if not).
> >
> > Yes, it makes sense to keep the magic in hugetlb code.
> > Note, though, that removing all races might be tricky.
> >
> > isolate_huge_page() checks for PageHuge under hugetlb_lock,
> > so there is a race between a call to PageHuge(x) and a subsequent
> > call to isolate_huge_page().
> > But we should be fine as isolate_huge_page will fail in case the page is
> > no longer HugeTLB.
> >
> > Also, since isolate_migratepages_block() gets called with ranges
> > pageblock aligned, we should never be handling tail pages in the core
> > of the function. E.g: the same way we handle THP:
>
> Gigantic pages? (spoiler: see my comments to next patch :) )

Oh, yeah, that sucks.
We had the same problem in scan_movable_pages/has_unmovable_pages
with such pages.

Uhm, I will try to be more careful :-)

--
Oscar Salvador
SUSE L3

2021-02-10 14:18:37

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On 10.02.21 15:09, Oscar Salvador wrote:
> On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
>> On 08.02.21 11:38, Oscar Salvador wrote:
>>> alloc_contig_range is not prepared to handle hugetlb pages and will
>>> fail if it ever sees one, but since they can be migrated as any other
>>> page (LRU and Movable), it makes sense to also handle them.
>>>
>>> For now, do it only when coming from alloc_contig_range.
>>>
>>> Signed-off-by: Oscar Salvador <[email protected]>
>>> ---
>>> mm/compaction.c | 17 +++++++++++++++++
>>> mm/vmscan.c | 5 +++--
>>> 2 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index e5acb9714436..89cd2e60da29 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>> goto isolate_fail;
>>> }
>>> + /*
>>> + * Handle hugetlb pages only when coming from alloc_contig
>>> + */
>>> + if (PageHuge(page) && cc->alloc_contig) {
>>> + if (page_count(page)) {
>>
>> I wonder if we should care about races here. What if someone concurrently
>> allocates/frees?
>>
>> Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
>> assume we'll have to handle that as well.
>>
>> I wonder if it would make sense to move some of the magic to hugetlb code
>> and handle it there with less chances for races (isolate if used,
>> alloc-and-dissolve if not).
>
> Yes, it makes sense to keep the magic in hugetlb code.
> Note, though, that removing all races might be tricky.
>
> isolate_huge_page() checks for PageHuge under hugetlb_lock,
> so there is a race between a call to PageHuge(x) and a subsequent
> call to isolate_huge_page().
> But we should be fine as isolate_huge_page will fail in case the page is
> no longer HugeTLB.
>
> Also, since isolate_migratepages_block() gets called with ranges
> pageblock aligned, we should never be handling tail pages in the core
> of the function. E.g: the same way we handle THP:

Gigantic pages? (spoiler: see my comments to next patch :) )

--
Thanks,

David / dhildenb

2021-02-10 14:25:08

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On 10.02.21 15:14, Oscar Salvador wrote:
> On Wed, Feb 10, 2021 at 03:11:05PM +0100, David Hildenbrand wrote:
>> On 10.02.21 15:09, Oscar Salvador wrote:
>>> On Wed, Feb 10, 2021 at 09:56:37AM +0100, David Hildenbrand wrote:
>>>> On 08.02.21 11:38, Oscar Salvador wrote:
>>>>> alloc_contig_range is not prepared to handle hugetlb pages and will
>>>>> fail if it ever sees one, but since they can be migrated as any other
>>>>> page (LRU and Movable), it makes sense to also handle them.
>>>>>
>>>>> For now, do it only when coming from alloc_contig_range.
>>>>>
>>>>> Signed-off-by: Oscar Salvador <[email protected]>
>>>>> ---
>>>>> mm/compaction.c | 17 +++++++++++++++++
>>>>> mm/vmscan.c | 5 +++--
>>>>> 2 files changed, 20 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>>> index e5acb9714436..89cd2e60da29 100644
>>>>> --- a/mm/compaction.c
>>>>> +++ b/mm/compaction.c
>>>>> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>>>> goto isolate_fail;
>>>>> }
>>>>> + /*
>>>>> + * Handle hugetlb pages only when coming from alloc_contig
>>>>> + */
>>>>> + if (PageHuge(page) && cc->alloc_contig) {
>>>>> + if (page_count(page)) {
>>>>
>>>> I wonder if we should care about races here. What if someone concurrently
>>>> allocates/frees?
>>>>
>>>> Note that PageHuge() succeeds on tail pages, isolate_huge_page() not, i
>>>> assume we'll have to handle that as well.
>>>>
>>>> I wonder if it would make sense to move some of the magic to hugetlb code
>>>> and handle it there with less chances for races (isolate if used,
>>>> alloc-and-dissolve if not).
>>>
>>> Yes, it makes sense to keep the magic in hugetlb code.
>>> Note, though, that removing all races might be tricky.
>>>
>>> isolate_huge_page() checks for PageHuge under hugetlb_lock,
>>> so there is a race between a call to PageHuge(x) and a subsequent
>>> call to isolate_huge_page().
>>> But we should be fine as isolate_huge_page will fail in case the page is
>>> no longer HugeTLB.
>>>
>>> Also, since isolate_migratepages_block() gets called with ranges
>>> pageblock aligned, we should never be handling tail pages in the core
>>> of the function. E.g: the same way we handle THP:
>>
>> Gigantic pages? (spoiler: see my comments to next patch :) )
>
> Oh, yeah, that sucks.
> We had the same problem in scan_movable_pages/has_unmovable_pages
> with such pages.
>
> Uhm, I will try to be more careful :-)
>

Gigantic pages are a minefield. Not your fault :)

--
Thanks,

David / dhildenb

2021-02-11 01:00:07

by Mike Kravetz

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On 2/8/21 2:38 AM, Oscar Salvador wrote:
> alloc_contig_range is not prepared to handle hugetlb pages and will
> fail if it ever sees one, but since they can be migrated as any other
> page (LRU and Movable), it makes sense to also handle them.
>
> For now, do it only when coming from alloc_contig_range.
>
> Signed-off-by: Oscar Salvador <[email protected]>
> ---
> mm/compaction.c | 17 +++++++++++++++++
> mm/vmscan.c | 5 +++--
> 2 files changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index e5acb9714436..89cd2e60da29 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> goto isolate_fail;
> }
>
> + /*
> + * Handle hugetlb pages only when coming from alloc_contig
> + */
> + if (PageHuge(page) && cc->alloc_contig) {
> + if (page_count(page)) {

Thanks for doing this!

I agree with everything in the discussion you and David had. This code
is racy, but since we are scanning lockless there is no way to eliminate
them all. Best to just minimize the windows and document.
--
Mike Kravetz

> + /*
> + * Hugetlb page in-use. Isolate and migrate.
> + */
> + if (isolate_huge_page(page, &cc->migratepages)) {
> + low_pfn += compound_nr(page) - 1;
> + goto isolate_success_no_list;
> + }
> + }
> + goto isolate_fail;
> + }
> +
> /*
> * Check may be lockless but that's ok as we recheck later.
> * It's possible to migrate LRU and non-lru movable pages.
> @@ -1041,6 +1057,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>
> isolate_success:
> list_add(&page->lru, &cc->migratepages);
> +isolate_success_no_list:
> cc->nr_migratepages += compound_nr(page);
> nr_isolated += compound_nr(page);
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index b1b574ad199d..0803adca4469 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1506,8 +1506,9 @@ unsigned int reclaim_clean_pages_from_list(struct zone *zone,
> LIST_HEAD(clean_pages);
>
> list_for_each_entry_safe(page, next, page_list, lru) {
> - if (page_is_file_lru(page) && !PageDirty(page) &&
> - !__PageMovable(page) && !PageUnevictable(page)) {
> + if (!PageHuge(page) && page_is_file_lru(page) &&
> + !PageDirty(page) && !__PageMovable(page) &&
> + !PageUnevictable(page)) {
> ClearPageActive(page);
> list_move(&page->lru, &clean_pages);
> }
>

2021-02-11 10:46:17

by David Hildenbrand

[permalink] [raw]
Subject: Re: [RFC PATCH 1/2] mm,page_alloc: Make alloc_contig_range handle in-use hugetlb pages

On 11.02.21 01:56, Mike Kravetz wrote:
> On 2/8/21 2:38 AM, Oscar Salvador wrote:
>> alloc_contig_range is not prepared to handle hugetlb pages and will
>> fail if it ever sees one, but since they can be migrated as any other
>> page (LRU and Movable), it makes sense to also handle them.
>>
>> For now, do it only when coming from alloc_contig_range.
>>
>> Signed-off-by: Oscar Salvador <[email protected]>
>> ---
>> mm/compaction.c | 17 +++++++++++++++++
>> mm/vmscan.c | 5 +++--
>> 2 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index e5acb9714436..89cd2e60da29 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -940,6 +940,22 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> goto isolate_fail;
>> }
>>
>> + /*
>> + * Handle hugetlb pages only when coming from alloc_contig
>> + */
>> + if (PageHuge(page) && cc->alloc_contig) {
>> + if (page_count(page)) {
>
> Thanks for doing this!
>
> I agree with everything in the discussion you and David had. This code
> is racy, but since we are scanning lockless there is no way to eliminate
> them all. Best to just minimize the windows and document.
>

Agreed - and make sure that we don't have strange side. (e.g., in the
next patch, allocate a new page, try to dissolve. Dissolving fails, what
happens to the just-allocated page?)

--
Thanks,

David / dhildenb