2022-11-24 10:19:22

by Gavin Shan

[permalink] [raw]
Subject: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

The issue is reported when removing memory through virtio_mem device.
The transparent huge page, experienced copy-on-write fault, is wrongly
regarded as pinned. The transparent huge page is escaped from being
isolated in isolate_migratepages_block(). The transparent huge page
can't be migrated and the corresponding memory block can't be put
into offline state.

Fix it by replacing page_mapcount() with total_mapcount(). With this,
the transparent huge page can be isolated and migrated, and the memory
block can be put into offline state. Besides, The page's refcount is
increased a bit earlier to avoid the page is released when the check
is executed.

Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
Cc: [email protected] # v5.7+
Reported-by: Zhenyu Zhang <[email protected]>
Suggested-by: David Hildenbrand <[email protected]>
Signed-off-by: Gavin Shan <[email protected]>
---
v2: Corrected fix tag and increase page's refcount before the check
---
mm/compaction.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index c51f7f545afe..1f6da31dd9a5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
goto isolate_fail;
}

+ /*
+ * Be careful not to clear PageLRU until after we're
+ * sure the page is not being freed elsewhere -- the
+ * page release code relies on it.
+ */
+ if (unlikely(!get_page_unless_zero(page)))
+ goto isolate_fail;
+
/*
* Migration will fail if an anonymous page is pinned in memory,
* so avoid taking lru_lock and isolating it unnecessarily in an
* admittedly racy check.
*/
mapping = page_mapping(page);
- if (!mapping && page_count(page) > page_mapcount(page))
- goto isolate_fail;
+ if (!mapping && (page_count(page) - 1) > total_mapcount(page))
+ goto isolate_fail_put;

/*
* Only allow to migrate anonymous pages in GFP_NOFS context
* because those do not depend on fs locks.
*/
if (!(cc->gfp_mask & __GFP_FS) && mapping)
- goto isolate_fail;
-
- /*
- * Be careful not to clear PageLRU until after we're
- * sure the page is not being freed elsewhere -- the
- * page release code relies on it.
- */
- if (unlikely(!get_page_unless_zero(page)))
- goto isolate_fail;
+ goto isolate_fail_put;

/* Only take pages on LRU: a check now makes later tests safe */
if (!PageLRU(page))
--
2.23.0


2022-11-24 10:56:42

by Gavin Shan

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 11/24/22 6:09 PM, David Hildenbrand wrote:
> On 24.11.22 10:55, Gavin Shan wrote:
>> The issue is reported when removing memory through virtio_mem device.
>> The transparent huge page, experienced copy-on-write fault, is wrongly
>> regarded as pinned. The transparent huge page is escaped from being
>> isolated in isolate_migratepages_block(). The transparent huge page
>> can't be migrated and the corresponding memory block can't be put
>> into offline state.
>>
>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>> the transparent huge page can be isolated and migrated, and the memory
>> block can be put into offline state. Besides, The page's refcount is
>> increased a bit earlier to avoid the page is released when the check
>> is executed.
>
> Did you look into handling pages that are in the swapcache case as well?
>
> See is_refcount_suitable() in mm/khugepaged.c.
>
> Should be easy to reproduce, let me know if you need inspiration.
>

Nope, I didn't look into the case. Please elaborate the details so that
I can reproduce it firstly.

>>
>> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
>> Cc: [email protected]   # v5.7+
>> Reported-by: Zhenyu Zhang <[email protected]>
>> Suggested-by: David Hildenbrand <[email protected]>
>> Signed-off-by: Gavin Shan <[email protected]>
>> ---
>> v2: Corrected fix tag and increase page's refcount before the check
>> ---
>>   mm/compaction.c | 22 +++++++++++-----------
>>   1 file changed, 11 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index c51f7f545afe..1f6da31dd9a5 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>               goto isolate_fail;
>>           }
>> +        /*
>> +         * Be careful not to clear PageLRU until after we're
>> +         * sure the page is not being freed elsewhere -- the
>> +         * page release code relies on it.
>> +         */
>> +        if (unlikely(!get_page_unless_zero(page)))
>> +            goto isolate_fail;
>> +
>>           /*
>>            * Migration will fail if an anonymous page is pinned in memory,
>>            * so avoid taking lru_lock and isolating it unnecessarily in an
>>            * admittedly racy check.
>>            */
>>           mapping = page_mapping(page);
>> -        if (!mapping && page_count(page) > page_mapcount(page))
>> -            goto isolate_fail;
>> +        if (!mapping && (page_count(page) - 1) > total_mapcount(page))
>> +            goto isolate_fail_put;
>>           /*
>>            * Only allow to migrate anonymous pages in GFP_NOFS context
>>            * because those do not depend on fs locks.
>>            */
>>           if (!(cc->gfp_mask & __GFP_FS) && mapping)
>> -            goto isolate_fail;
>> -
>> -        /*
>> -         * Be careful not to clear PageLRU until after we're
>> -         * sure the page is not being freed elsewhere -- the
>> -         * page release code relies on it.
>> -         */
>> -        if (unlikely(!get_page_unless_zero(page)))
>> -            goto isolate_fail;
>> +            goto isolate_fail_put;
>>           /* Only take pages on LRU: a check now makes later tests safe */
>>           if (!PageLRU(page))
>

Thanks,
Gavin

2022-11-24 10:57:01

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 24.11.22 10:55, Gavin Shan wrote:
> The issue is reported when removing memory through virtio_mem device.
> The transparent huge page, experienced copy-on-write fault, is wrongly
> regarded as pinned. The transparent huge page is escaped from being
> isolated in isolate_migratepages_block(). The transparent huge page
> can't be migrated and the corresponding memory block can't be put
> into offline state.
>
> Fix it by replacing page_mapcount() with total_mapcount(). With this,
> the transparent huge page can be isolated and migrated, and the memory
> block can be put into offline state. Besides, The page's refcount is
> increased a bit earlier to avoid the page is released when the check
> is executed.

Did you look into handling pages that are in the swapcache case as well?

See is_refcount_suitable() in mm/khugepaged.c.

Should be easy to reproduce, let me know if you need inspiration.

>
> Fixes: 1da2f328fa64 ("mm,thp,compaction,cma: allow THP migration for CMA allocations")
> Cc: [email protected] # v5.7+
> Reported-by: Zhenyu Zhang <[email protected]>
> Suggested-by: David Hildenbrand <[email protected]>
> Signed-off-by: Gavin Shan <[email protected]>
> ---
> v2: Corrected fix tag and increase page's refcount before the check
> ---
> mm/compaction.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index c51f7f545afe..1f6da31dd9a5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -984,29 +984,29 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> goto isolate_fail;
> }
>
> + /*
> + * Be careful not to clear PageLRU until after we're
> + * sure the page is not being freed elsewhere -- the
> + * page release code relies on it.
> + */
> + if (unlikely(!get_page_unless_zero(page)))
> + goto isolate_fail;
> +
> /*
> * Migration will fail if an anonymous page is pinned in memory,
> * so avoid taking lru_lock and isolating it unnecessarily in an
> * admittedly racy check.
> */
> mapping = page_mapping(page);
> - if (!mapping && page_count(page) > page_mapcount(page))
> - goto isolate_fail;
> + if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> + goto isolate_fail_put;
>
> /*
> * Only allow to migrate anonymous pages in GFP_NOFS context
> * because those do not depend on fs locks.
> */
> if (!(cc->gfp_mask & __GFP_FS) && mapping)
> - goto isolate_fail;
> -
> - /*
> - * Be careful not to clear PageLRU until after we're
> - * sure the page is not being freed elsewhere -- the
> - * page release code relies on it.
> - */
> - if (unlikely(!get_page_unless_zero(page)))
> - goto isolate_fail;
> + goto isolate_fail_put;
>
> /* Only take pages on LRU: a check now makes later tests safe */
> if (!PageLRU(page))

--
Thanks,

David / dhildenb

2022-11-24 11:16:32

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 24.11.22 11:21, Gavin Shan wrote:
> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>> On 24.11.22 10:55, Gavin Shan wrote:
>>> The issue is reported when removing memory through virtio_mem device.
>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>> regarded as pinned. The transparent huge page is escaped from being
>>> isolated in isolate_migratepages_block(). The transparent huge page
>>> can't be migrated and the corresponding memory block can't be put
>>> into offline state.
>>>
>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>> the transparent huge page can be isolated and migrated, and the memory
>>> block can be put into offline state. Besides, The page's refcount is
>>> increased a bit earlier to avoid the page is released when the check
>>> is executed.
>>
>> Did you look into handling pages that are in the swapcache case as well?
>>
>> See is_refcount_suitable() in mm/khugepaged.c.
>>
>> Should be easy to reproduce, let me know if you need inspiration.
>>
>
> Nope, I didn't look into the case. Please elaborate the details so that
> I can reproduce it firstly.


A simple reproducer would be (on a system with ordinary swap (not zram))

1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP

2) Enable THP for that region (MADV_HUGEPAGE)

3) Populate a THP (e.g., write access)

4) PTE-map the THP, for example, using MADV_FREE on the last subpage

5) Trigger swapout of the THP, for example, using MADV_PAGEOUT

6) Read-access to some subpages to fault them in from the swapcache


Now you'd have a THP, which

1) Is partially PTE-mapped into the page table
2) Is in the swapcache (each subpage should have one reference from the
swapache)


Now we could test, if alloc_contig_range() will still succeed (e.g.,
using virtio-mem).

--
Thanks,

David / dhildenb

2022-11-24 12:45:58

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation


On 24 Nov 2022, at 5:43, David Hildenbrand wrote:

> On 24.11.22 11:21, Gavin Shan wrote:
>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>> The issue is reported when removing memory through virtio_mem device.
>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>> regarded as pinned. The transparent huge page is escaped from being
>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>> can't be migrated and the corresponding memory block can't be put
>>>> into offline state.
>>>>
>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>> the transparent huge page can be isolated and migrated, and the memory
>>>> block can be put into offline state. Besides, The page's refcount is
>>>> increased a bit earlier to avoid the page is released when the check
>>>> is executed.
>>>
>>> Did you look into handling pages that are in the swapcache case as well?
>>>
>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>
>>> Should be easy to reproduce, let me know if you need inspiration.
>>>
>>
>> Nope, I didn't look into the case. Please elaborate the details so that
>> I can reproduce it firstly.
>
>
> A simple reproducer would be (on a system with ordinary swap (not zram))
>
> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>
> 2) Enable THP for that region (MADV_HUGEPAGE)
>
> 3) Populate a THP (e.g., write access)
>
> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>
> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT

Added the original THP swapout code author, Ying.

At this step, the THP will be split, right?

https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786

Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap()
then swapped out. But I cannot find that split code now.


>
> 6) Read-access to some subpages to fault them in from the swapcache
>
>
> Now you'd have a THP, which
>
> 1) Is partially PTE-mapped into the page table
> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>
>
> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>
> --
> Thanks,
>
> David / dhildenb

--
Best Regards,
Yan Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2022-11-24 13:02:44

by Gavin Shan

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 11/24/22 6:43 PM, David Hildenbrand wrote:
> On 24.11.22 11:21, Gavin Shan wrote:
>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>> The issue is reported when removing memory through virtio_mem device.
>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>> regarded as pinned. The transparent huge page is escaped from being
>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>> can't be migrated and the corresponding memory block can't be put
>>>> into offline state.
>>>>
>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>> the transparent huge page can be isolated and migrated, and the memory
>>>> block can be put into offline state. Besides, The page's refcount is
>>>> increased a bit earlier to avoid the page is released when the check
>>>> is executed.
>>>
>>> Did you look into handling pages that are in the swapcache case as well?
>>>
>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>
>>> Should be easy to reproduce, let me know if you need inspiration.
>>>
>>
>> Nope, I didn't look into the case. Please elaborate the details so that
>> I can reproduce it firstly.
>
>
> A simple reproducer would be (on a system with ordinary swap (not zram))
>
> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>
> 2) Enable THP for that region (MADV_HUGEPAGE)
>
> 3) Populate a THP (e.g., write access)
>
> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>
> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>
> 6) Read-access to some subpages to fault them in from the swapcache
>
>
> Now you'd have a THP, which
>
> 1) Is partially PTE-mapped into the page table
> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>
>
> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>

Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
the THP (e.g. one sub-page) will force the THP to be split.

I followed your steps in the attached program, there is no issue to do memory hot-remove
through virtio-mem with or without this patch.

# numactl -p 1 testsuite mm swap -k
Any key to split THP
Any key to swap sub-pages
Any key to read the swapped sub-pages
Page[000]: 0xffffffffffffffff
Page[001]: 0xffffffffffffffff
:
Page[255]: 0xffffffffffffffff
Any key to exit // hold here and the program doesn't exit

(qemu) qom-set vm1 requested-size 0
[ 356.005396] virtio_mem virtio1: plugged size: 0x40000000
[ 356.005996] virtio_mem virtio1: requested size: 0x0
[ 356.350299] Fallback order for Node 0: 0 1
[ 356.350810] Fallback order for Node 1: 1 0
[ 356.351260] Built 2 zonelists, mobility grouping on. Total pages: 491343
[ 356.351998] Policy zone: DMA

Thanks,
Gavin


Attachments:
swap.c (3.33 kB)

2022-11-24 14:18:33

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 24.11.22 13:38, Zi Yan wrote:
>
> On 24 Nov 2022, at 5:43, David Hildenbrand wrote:
>
>> On 24.11.22 11:21, Gavin Shan wrote:
>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>> can't be migrated and the corresponding memory block can't be put
>>>>> into offline state.
>>>>>
>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>> increased a bit earlier to avoid the page is released when the check
>>>>> is executed.
>>>>
>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>
>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>
>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>
>>>
>>> Nope, I didn't look into the case. Please elaborate the details so that
>>> I can reproduce it firstly.
>>
>>
>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>
>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>
>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>
>> 3) Populate a THP (e.g., write access)
>>
>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>
>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>
> Added the original THP swapout code author, Ying.
>
> At this step, the THP will be split, right?
>
> https://elixir.bootlin.com/linux/latest/source/mm/vmscan.c#L1786
>
> Even if a THP has PMD mapping, IIRC, it is split in the add_to_swap()
> then swapped out. But I cannot find that split code now.

I recall there was some sequence to achieve it. Maybe it was
swapping out the PMD first and not triggering a PTE-mapping first.

mm/vmscan.c:shrink_folio_list()

if (folio_test_large(folio)) {
/* cannot split folio, skip it */
if (!can_split_folio(folio, NULL))
goto activate_locked;
/*
* Split folios without a PMD map right
* away. Chances are some or all of the
* tail pages can be freed without IO.
*/
if (!folio_entire_mapcount(folio) &&
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
}

So the sequence might have to be

1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP

2) Enable THP for that region (MADV_HUGEPAGE)

3) Populate a THP (e.g., write access)

4) Trigger swapout of the THP, for example, using MADV_PAGEOUT

5) Access some subpage

As we don't have PMD swap entries, we will PTE-map the
THP during try_to_unmap() IIRC.



Independent of that, the check we have here also doesn't consider
ordinary order-0 pages that might be in the swapache.

--
Thanks,

David / dhildenb

2022-11-24 14:19:03

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 24.11.22 14:22, David Hildenbrand wrote:
> On 24.11.22 13:55, Gavin Shan wrote:
>> On 11/24/22 6:43 PM, David Hildenbrand wrote:
>>> On 24.11.22 11:21, Gavin Shan wrote:
>>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>>> can't be migrated and the corresponding memory block can't be put
>>>>>> into offline state.
>>>>>>
>>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>>> increased a bit earlier to avoid the page is released when the check
>>>>>> is executed.
>>>>>
>>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>>
>>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>>
>>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>>
>>>>
>>>> Nope, I didn't look into the case. Please elaborate the details so that
>>>> I can reproduce it firstly.
>>>
>>>
>>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>>
>>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>>
>>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>>
>>> 3) Populate a THP (e.g., write access)
>>>
>>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>>
>>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>>>
>>> 6) Read-access to some subpages to fault them in from the swapcache
>>>
>>>
>>> Now you'd have a THP, which
>>>
>>> 1) Is partially PTE-mapped into the page table
>>> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>>>
>>>
>>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>>>
>>
>> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
>> the THP (e.g. one sub-page) will force the THP to be split.
>>
>> I followed your steps in the attached program, there is no issue to do memory hot-remove
>> through virtio-mem with or without this patch.
>
> Interesting. But I don't really see how we could pass this check with a
> page that's in the swapcache, maybe I'm missing something else.
>
> I'll try to see if I can reproduce it.
>

After some unsuccessful attempts and many head-scratches, I realized
that it's quite simple why we don't have to worry about swapcache pages
here:

page_mapping() is != NULL for pages in the swapcache: folio_mapping()
makes this rather obvious:

if (unlikely(folio_test_swapcache(folio))
return swap_address_space(folio_swap_entry(folio));


I think the get_page_unless_zero() might also be a fix for the
page_mapping() call, smells like something could blow up on concurrent
page freeing. (what about concurrent removal from the swapcache? nobody
knows :) )


Thanks Gavin!

Acked-by: David Hildenbrand <[email protected]>


--
Thanks,

David / dhildenb

2022-11-24 14:21:11

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

On 24.11.22 13:55, Gavin Shan wrote:
> On 11/24/22 6:43 PM, David Hildenbrand wrote:
>> On 24.11.22 11:21, Gavin Shan wrote:
>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
>>>> On 24.11.22 10:55, Gavin Shan wrote:
>>>>> The issue is reported when removing memory through virtio_mem device.
>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
>>>>> regarded as pinned. The transparent huge page is escaped from being
>>>>> isolated in isolate_migratepages_block(). The transparent huge page
>>>>> can't be migrated and the corresponding memory block can't be put
>>>>> into offline state.
>>>>>
>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
>>>>> the transparent huge page can be isolated and migrated, and the memory
>>>>> block can be put into offline state. Besides, The page's refcount is
>>>>> increased a bit earlier to avoid the page is released when the check
>>>>> is executed.
>>>>
>>>> Did you look into handling pages that are in the swapcache case as well?
>>>>
>>>> See is_refcount_suitable() in mm/khugepaged.c.
>>>>
>>>> Should be easy to reproduce, let me know if you need inspiration.
>>>>
>>>
>>> Nope, I didn't look into the case. Please elaborate the details so that
>>> I can reproduce it firstly.
>>
>>
>> A simple reproducer would be (on a system with ordinary swap (not zram))
>>
>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
>>
>> 2) Enable THP for that region (MADV_HUGEPAGE)
>>
>> 3) Populate a THP (e.g., write access)
>>
>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
>>
>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
>>
>> 6) Read-access to some subpages to fault them in from the swapcache
>>
>>
>> Now you'd have a THP, which
>>
>> 1) Is partially PTE-mapped into the page table
>> 2) Is in the swapcache (each subpage should have one reference from the swapache)
>>
>>
>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
>>
>
> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
> the THP (e.g. one sub-page) will force the THP to be split.
>
> I followed your steps in the attached program, there is no issue to do memory hot-remove
> through virtio-mem with or without this patch.

Interesting. But I don't really see how we could pass this check with a
page that's in the swapcache, maybe I'm missing something else.

I'll try to see if I can reproduce it.

--
Thanks,

David / dhildenb

2022-11-25 08:11:47

by Zhenyu Zhang

[permalink] [raw]
Subject: Re: [PATCH v2] mm: migrate: Fix THP's mapcount on isolation

With the patch applied, I'm unable to hit memory hot-remove failure in
the environment where the issue was initially found.

Tested-by: Zhenyu Zhang <[email protected]>

On Thu, Nov 24, 2022 at 10:09 PM David Hildenbrand <[email protected]> wrote:
>
> On 24.11.22 14:22, David Hildenbrand wrote:
> > On 24.11.22 13:55, Gavin Shan wrote:
> >> On 11/24/22 6:43 PM, David Hildenbrand wrote:
> >>> On 24.11.22 11:21, Gavin Shan wrote:
> >>>> On 11/24/22 6:09 PM, David Hildenbrand wrote:
> >>>>> On 24.11.22 10:55, Gavin Shan wrote:
> >>>>>> The issue is reported when removing memory through virtio_mem device.
> >>>>>> The transparent huge page, experienced copy-on-write fault, is wrongly
> >>>>>> regarded as pinned. The transparent huge page is escaped from being
> >>>>>> isolated in isolate_migratepages_block(). The transparent huge page
> >>>>>> can't be migrated and the corresponding memory block can't be put
> >>>>>> into offline state.
> >>>>>>
> >>>>>> Fix it by replacing page_mapcount() with total_mapcount(). With this,
> >>>>>> the transparent huge page can be isolated and migrated, and the memory
> >>>>>> block can be put into offline state. Besides, The page's refcount is
> >>>>>> increased a bit earlier to avoid the page is released when the check
> >>>>>> is executed.
> >>>>>
> >>>>> Did you look into handling pages that are in the swapcache case as well?
> >>>>>
> >>>>> See is_refcount_suitable() in mm/khugepaged.c.
> >>>>>
> >>>>> Should be easy to reproduce, let me know if you need inspiration.
> >>>>>
> >>>>
> >>>> Nope, I didn't look into the case. Please elaborate the details so that
> >>>> I can reproduce it firstly.
> >>>
> >>>
> >>> A simple reproducer would be (on a system with ordinary swap (not zram))
> >>>
> >>> 1) mmap a region (MAP_ANON|MAP_PRIVATE) that can hold a THP
> >>>
> >>> 2) Enable THP for that region (MADV_HUGEPAGE)
> >>>
> >>> 3) Populate a THP (e.g., write access)
> >>>
> >>> 4) PTE-map the THP, for example, using MADV_FREE on the last subpage
> >>>
> >>> 5) Trigger swapout of the THP, for example, using MADV_PAGEOUT
> >>>
> >>> 6) Read-access to some subpages to fault them in from the swapcache
> >>>
> >>>
> >>> Now you'd have a THP, which
> >>>
> >>> 1) Is partially PTE-mapped into the page table
> >>> 2) Is in the swapcache (each subpage should have one reference from the swapache)
> >>>
> >>>
> >>> Now we could test, if alloc_contig_range() will still succeed (e.g., using virtio-mem).
> >>>
> >>
> >> Thanks for the details. Step (4) and (5) can be actually combined. To swap part of
> >> the THP (e.g. one sub-page) will force the THP to be split.
> >>
> >> I followed your steps in the attached program, there is no issue to do memory hot-remove
> >> through virtio-mem with or without this patch.
> >
> > Interesting. But I don't really see how we could pass this check with a
> > page that's in the swapcache, maybe I'm missing something else.
> >
> > I'll try to see if I can reproduce it.
> >
>
> After some unsuccessful attempts and many head-scratches, I realized
> that it's quite simple why we don't have to worry about swapcache pages
> here:
>
> page_mapping() is != NULL for pages in the swapcache: folio_mapping()
> makes this rather obvious:
>
> if (unlikely(folio_test_swapcache(folio))
> return swap_address_space(folio_swap_entry(folio));
>
>
> I think the get_page_unless_zero() might also be a fix for the
> page_mapping() call, smells like something could blow up on concurrent
> page freeing. (what about concurrent removal from the swapcache? nobody
> knows :) )
>
>
> Thanks Gavin!
>
> Acked-by: David Hildenbrand <[email protected]>
>
>
> --
> Thanks,
>
> David / dhildenb
>