The commit b717d6b93b54 ("mm: compaction: include compound page count
for scanning in pageblock isolation") had added compound page statistics
for scanning in pageblock isolation, to make sure the number of scanned
pages are always larger than the number of isolated pages when isolating
mirgratable or free pageblock.
However, when failed to isolate the pages when scanning the mirgratable or
free pageblock, the isolation failure path did not consider the scanning
statistics of the compound pages, which can show the incorrect number of
scanned pages in tracepoints or the vmstats to make people confusing about
the page scanning pressure in memory compaction.
Thus we should take into account the number of scanning pages when failed
to isolate the compound pages to make the statistics accurate.
Signed-off-by: Baolin Wang <[email protected]>
---
mm/compaction.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mm/compaction.c b/mm/compaction.c
index 5a9501e0ae01..c9d9ad958e2a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -587,6 +587,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
blockpfn += (1UL << order) - 1;
cursor += (1UL << order) - 1;
}
+ nr_scanned += (1UL << order) - 1;
goto isolate_fail;
}
@@ -873,9 +874,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
cond_resched();
}
- nr_scanned++;
-
page = pfn_to_page(low_pfn);
+ nr_scanned += compound_nr(page);
/*
* Check if the pageblock has already been marked skipped.
@@ -1077,6 +1077,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
*/
if (unlikely(PageCompound(page) && !cc->alloc_contig)) {
low_pfn += compound_nr(page) - 1;
+ nr_scanned += compound_nr(page) - 1;
SetPageLRU(page);
goto isolate_fail_put;
}
@@ -1097,7 +1098,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
isolate_success_no_list:
cc->nr_migratepages += compound_nr(page);
nr_isolated += compound_nr(page);
- nr_scanned += compound_nr(page) - 1;
/*
* Avoid isolating too much unless this block is being
--
2.27.0
When trying to isolate a migratable pageblock, it can contain several
normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64)
in a pageblock. That means we may hold the lru lock of a normal page to
continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page()
in the same migratable pageblock.
However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb
page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the
hugetlb's refcount is zero. That means we can still enter the direct compaction
path to allocate a new hugetlb page under the current lru lock, which
may cause possible deadlock.
To avoid this possible deadlock, we should release the lru lock when trying
to isolate a hugetbl page. Moreover it does not make sense to take the lru
lock to isolate a hugetlb, which is not in the lru list.
Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
Signed-off-by: Baolin Wang <[email protected]>
---
mm/compaction.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index c9d9ad958e2a..ac8ff152421a 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
}
if (PageHuge(page) && cc->alloc_contig) {
+ if (locked) {
+ unlock_page_lruvec_irqrestore(locked, flags);
+ locked = NULL;
+ }
+
ret = isolate_or_dissolve_huge_page(page, &cc->migratepages);
/*
--
2.27.0
On 03/13/23 18:37, Baolin Wang wrote:
> When trying to isolate a migratable pageblock, it can contain several
> normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64)
> in a pageblock. That means we may hold the lru lock of a normal page to
> continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page()
> in the same migratable pageblock.
>
> However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb
> page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the
> hugetlb's refcount is zero. That means we can still enter the direct compaction
> path to allocate a new hugetlb page under the current lru lock, which
> may cause possible deadlock.
>
> To avoid this possible deadlock, we should release the lru lock when trying
> to isolate a hugetbl page. Moreover it does not make sense to take the lru
> lock to isolate a hugetlb, which is not in the lru list.
>
> Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
> Signed-off-by: Baolin Wang <[email protected]>
> ---
> mm/compaction.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c9d9ad958e2a..ac8ff152421a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
Thanks!
I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was
not considered. However, I wonder if this can really happen in practice?
Before the code below, there is this:
/*
* Periodically drop the lock (if held) regardless of its
* contention, to give chance to IRQs. Abort completely if
* a fatal signal is pending.
*/
if (!(low_pfn % COMPACT_CLUSTER_MAX)) {
if (locked) {
unlock_page_lruvec_irqrestore(locked, flags);
locked = NULL;
}
...
}
It would seem that the pfn of a hugetlb page would always be a multiple of
COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if
that is ALWAYS true and would prefer something like the code you suggested.
Did you actually see this deadlock in practice?
--
Mike Kravetz
> @@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> }
>
> if (PageHuge(page) && cc->alloc_contig) {
> + if (locked) {
> + unlock_page_lruvec_irqrestore(locked, flags);
> + locked = NULL;
> + }
> +
> ret = isolate_or_dissolve_huge_page(page, &cc->migratepages);
>
> /*
> --
> 2.27.0
>
>
On Mon, 13 Mar 2023 10:08:38 -0700 Mike Kravetz <[email protected]> wrote:
> I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was
> not considered. However, I wonder if this can really happen in practice?
>
> Before the code below, there is this:
>
> /*
> * Periodically drop the lock (if held) regardless of its
> * contention, to give chance to IRQs. Abort completely if
> * a fatal signal is pending.
> */
> if (!(low_pfn % COMPACT_CLUSTER_MAX)) {
> if (locked) {
> unlock_page_lruvec_irqrestore(locked, flags);
> locked = NULL;
> }
> ...
> }
>
> It would seem that the pfn of a hugetlb page would always be a multiple of
> COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if
> that is ALWAYS true and would prefer something like the code you suggested.
>
> Did you actually see this deadlock in practice?
Presumably the lack of lockdep reports about this tells us something?
On 3/14/2023 1:08 AM, Mike Kravetz wrote:
> On 03/13/23 18:37, Baolin Wang wrote:
>> When trying to isolate a migratable pageblock, it can contain several
>> normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64)
>> in a pageblock. That means we may hold the lru lock of a normal page to
>> continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page()
>> in the same migratable pageblock.
>>
>> However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb
>> page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the
>> hugetlb's refcount is zero. That means we can still enter the direct compaction
>> path to allocate a new hugetlb page under the current lru lock, which
>> may cause possible deadlock.
>>
>> To avoid this possible deadlock, we should release the lru lock when trying
>> to isolate a hugetbl page. Moreover it does not make sense to take the lru
>> lock to isolate a hugetlb, which is not in the lru list.
>>
>> Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
>> Signed-off-by: Baolin Wang <[email protected]>
>> ---
>> mm/compaction.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index c9d9ad958e2a..ac8ff152421a 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>
> Thanks!
>
> I suspect holding the lru lock when calling isolate_or_dissolve_huge_page was
> not considered. However, I wonder if this can really happen in practice?
>
> Before the code below, there is this:
>
> /*
> * Periodically drop the lock (if held) regardless of its
> * contention, to give chance to IRQs. Abort completely if
> * a fatal signal is pending.
> */
> if (!(low_pfn % COMPACT_CLUSTER_MAX)) {
> if (locked) {
> unlock_page_lruvec_irqrestore(locked, flags);
> locked = NULL;
> }
> ...
> }
>
> It would seem that the pfn of a hugetlb page would always be a multiple of
> COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if
> that is ALWAYS true and would prefer something like the code you suggested.
Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch,
which contains 16 contiguous normal pages.
> Did you actually see this deadlock in practice?
I did not see this issue in practice until now, but I think it can be
triggered from code inspection if trying to isolate a CONT-PTE hugetlb.
On 03/14/23 12:11, Baolin Wang wrote:
> On 3/14/2023 1:08 AM, Mike Kravetz wrote:
> > On 03/13/23 18:37, Baolin Wang wrote:
> >
> > It would seem that the pfn of a hugetlb page would always be a multiple of
> > COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if
> > that is ALWAYS true and would prefer something like the code you suggested.
>
> Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch,
> which contains 16 contiguous normal pages.
>
Right. I keep forgetting about the CONT-* page sizes on arm :(
In any case, I think explicitly dropping the lock as you have done is a
good idea.
Feel free to add,
Reviewed-by: Mike Kravetz <[email protected]>
--
Mike Kravetz
On 3/15/2023 1:27 AM, Mike Kravetz wrote:
> On 03/14/23 12:11, Baolin Wang wrote:
>> On 3/14/2023 1:08 AM, Mike Kravetz wrote:
>>> On 03/13/23 18:37, Baolin Wang wrote:
>>>
>>> It would seem that the pfn of a hugetlb page would always be a multiple of
>>> COMPACT_CLUSTER_MAX so we would drop the lock. However, I am not sure if
>>> that is ALWAYS true and would prefer something like the code you suggested.
>>
>> Well, this is not always true, suppose the CONT-PTE hugetlb on ARM arch,
>> which contains 16 contiguous normal pages.
>>
>
> Right. I keep forgetting about the CONT-* page sizes on arm :(
>
> In any case, I think explicitly dropping the lock as you have done is a
> good idea.
>
> Feel free to add,
>
> Reviewed-by: Mike Kravetz <[email protected]>
Thanks for reviewing.
On 3/13/23 11:37, Baolin Wang wrote:
> The commit b717d6b93b54 ("mm: compaction: include compound page count
> for scanning in pageblock isolation") had added compound page statistics
> for scanning in pageblock isolation, to make sure the number of scanned
> pages are always larger than the number of isolated pages when isolating
> mirgratable or free pageblock.
>
> However, when failed to isolate the pages when scanning the mirgratable or
> free pageblock, the isolation failure path did not consider the scanning
> statistics of the compound pages, which can show the incorrect number of
> scanned pages in tracepoints or the vmstats to make people confusing about
> the page scanning pressure in memory compaction.
>
> Thus we should take into account the number of scanning pages when failed
> to isolate the compound pages to make the statistics accurate.
>
> Signed-off-by: Baolin Wang <[email protected]>
> ---
> mm/compaction.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..c9d9ad958e2a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -587,6 +587,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
> blockpfn += (1UL << order) - 1;
> cursor += (1UL << order) - 1;
> }
> + nr_scanned += (1UL << order) - 1;
I'd rather put it in the block above that tests order < MAX_ORDER. Otherwise
as the comments say, the value can be bogus as it's racy.
> goto isolate_fail;
> }
>
> @@ -873,9 +874,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> cond_resched();
> }
>
> - nr_scanned++;
> -
> page = pfn_to_page(low_pfn);
> + nr_scanned += compound_nr(page);
For the same reason, I'd rather leave the nr_scanned adjustment by order in
the specific code blocks where we know/think we have a compound or huge page
and have sanity checked the order/nr_pages, and not add an unchecked
compound_nr() here.
Thanks.
>
> /*
> * Check if the pageblock has already been marked skipped.
> @@ -1077,6 +1077,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> */
> if (unlikely(PageCompound(page) && !cc->alloc_contig)) {
> low_pfn += compound_nr(page) - 1;
> + nr_scanned += compound_nr(page) - 1;
> SetPageLRU(page);
> goto isolate_fail_put;
> }
> @@ -1097,7 +1098,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> isolate_success_no_list:
> cc->nr_migratepages += compound_nr(page);
> nr_isolated += compound_nr(page);
> - nr_scanned += compound_nr(page) - 1;
>
> /*
> * Avoid isolating too much unless this block is being
On 3/13/23 11:37, Baolin Wang wrote:
> When trying to isolate a migratable pageblock, it can contain several
> normal pages or several hugetlb pages (e.g. CONT-PTE 64K hugetlb on arm64)
> in a pageblock. That means we may hold the lru lock of a normal page to
> continue to isolate the next hugetlb page by isolate_or_dissolve_huge_page()
> in the same migratable pageblock.
>
> However in the isolate_or_dissolve_huge_page(), it may allocate a new hugetlb
> page and dissolve the old one by alloc_and_dissolve_hugetlb_folio() if the
> hugetlb's refcount is zero. That means we can still enter the direct compaction
> path to allocate a new hugetlb page under the current lru lock, which
> may cause possible deadlock.
>
> To avoid this possible deadlock, we should release the lru lock when trying
> to isolate a hugetbl page. Moreover it does not make sense to take the lru
> lock to isolate a hugetlb, which is not in the lru list.
>
> Fixes: 369fa227c219 ("mm: make alloc_contig_range handle free hugetlb pages")
> Signed-off-by: Baolin Wang <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Thanks!
> ---
> mm/compaction.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c9d9ad958e2a..ac8ff152421a 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -893,6 +893,11 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> }
>
> if (PageHuge(page) && cc->alloc_contig) {
> + if (locked) {
> + unlock_page_lruvec_irqrestore(locked, flags);
> + locked = NULL;
> + }
> +
> ret = isolate_or_dissolve_huge_page(page, &cc->migratepages);
>
> /*
On 3/15/2023 11:54 PM, Vlastimil Babka wrote:
> On 3/13/23 11:37, Baolin Wang wrote:
>> The commit b717d6b93b54 ("mm: compaction: include compound page count
>> for scanning in pageblock isolation") had added compound page statistics
>> for scanning in pageblock isolation, to make sure the number of scanned
>> pages are always larger than the number of isolated pages when isolating
>> mirgratable or free pageblock.
>>
>> However, when failed to isolate the pages when scanning the mirgratable or
>> free pageblock, the isolation failure path did not consider the scanning
>> statistics of the compound pages, which can show the incorrect number of
>> scanned pages in tracepoints or the vmstats to make people confusing about
>> the page scanning pressure in memory compaction.
>>
>> Thus we should take into account the number of scanning pages when failed
>> to isolate the compound pages to make the statistics accurate.
>>
>> Signed-off-by: Baolin Wang <[email protected]>
>> ---
>> mm/compaction.c | 6 +++---
>> 1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..c9d9ad958e2a 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -587,6 +587,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>> blockpfn += (1UL << order) - 1;
>> cursor += (1UL << order) - 1;
>> }
>> + nr_scanned += (1UL << order) - 1;
>
> I'd rather put it in the block above that tests order < MAX_ORDER. Otherwise
> as the comments say, the value can be bogus as it's racy.
Right, thanks for pointing it out. Will do in next version.
>
>> goto isolate_fail;
>> }
>>
>> @@ -873,9 +874,8 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> cond_resched();
>> }
>>
>> - nr_scanned++;
>> -
>> page = pfn_to_page(low_pfn);
>> + nr_scanned += compound_nr(page);
>
> For the same reason, I'd rather leave the nr_scanned adjustment by order in
> the specific code blocks where we know/think we have a compound or huge page
> and have sanity checked the order/nr_pages, and not add an unchecked
> compound_nr() here.
OK. Sound reasonable to me. Thanks for your input.
>> /*
>> * Check if the pageblock has already been marked skipped.
>> @@ -1077,6 +1077,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> */
>> if (unlikely(PageCompound(page) && !cc->alloc_contig)) {
>> low_pfn += compound_nr(page) - 1;
>> + nr_scanned += compound_nr(page) - 1;
>> SetPageLRU(page);
>> goto isolate_fail_put;
>> }
>> @@ -1097,7 +1098,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> isolate_success_no_list:
>> cc->nr_migratepages += compound_nr(page);
>> nr_isolated += compound_nr(page);
>> - nr_scanned += compound_nr(page) - 1;
>>
>> /*
>> * Avoid isolating too much unless this block is being