2024-04-24 22:46:29

by Zi Yan

[permalink] [raw]
Subject: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

From: Zi Yan <[email protected]>

In __folio_remove_rmap(), a large folio is added to deferred split list
if any page in a folio loses its final mapping. It is possible that
the folio is unmapped fully, but it is unnecessary to add the folio
to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
before adding a folio to deferred split list. If the folio is already
on the deferred split list, it will be skipped. This issue applies to
both PTE-mapped THP and mTHP.

Commit 98046944a159 ("mm: huge_memory: add the missing
folio_test_pmd_mappable() for THP split statistics") tried to exclude
mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
However, this miscount was present even earlier due to implementation,
since PTEs are unmapped individually and first PTE unmapping adds the THP
into the deferred split list.

With commit b06dc281aa99 ("mm/rmap: introduce
folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
folios in one shot without causing the miscount, hence this patch.

Signed-off-by: Zi Yan <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
---
mm/rmap.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index a7913a454028..2809348add7b 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
* page of the folio is unmapped and at least one page
* is still mapped.
*/
- if (folio_test_large(folio) && folio_test_anon(folio))
- if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
- deferred_split_folio(folio);
+ if (folio_test_large(folio) && folio_test_anon(folio) &&
+ ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
+ (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
+ deferred_split_folio(folio);
}

/*

base-commit: 2541ee5668b019c486dd3e815114130e35c1495d
--
2.43.0



2024-04-25 03:46:00

by Lance Yang

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

Hey Zi,

On Thu, Apr 25, 2024 at 6:46 AM Zi Yan <[email protected]> wrote:
>
> From: Zi Yan <[email protected]>
>
> In __folio_remove_rmap(), a large folio is added to deferred split list
> if any page in a folio loses its final mapping. It is possible that
> the folio is unmapped fully, but it is unnecessary to add the folio

Agreed. If a folio is fully unmapped, then that's unnecessary to add
to the deferred split list.

> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
> before adding a folio to deferred split list. If the folio is already
> on the deferred split list, it will be skipped. This issue applies to
> both PTE-mapped THP and mTHP.
>
> Commit 98046944a159 ("mm: huge_memory: add the missing
> folio_test_pmd_mappable() for THP split statistics") tried to exclude
> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
> However, this miscount was present even earlier due to implementation,
> since PTEs are unmapped individually and first PTE unmapping adds the THP
> into the deferred split list.
>
> With commit b06dc281aa99 ("mm/rmap: introduce
> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
> folios in one shot without causing the miscount, hence this patch.
>
> Signed-off-by: Zi Yan <[email protected]>
> Reviewed-by: Yang Shi <[email protected]>
> ---
> mm/rmap.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a7913a454028..2809348add7b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
> * page of the folio is unmapped and at least one page
> * is still mapped.
> */
> - if (folio_test_large(folio) && folio_test_anon(folio))
> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
> - deferred_split_folio(folio);
> + if (folio_test_large(folio) && folio_test_anon(folio) &&
> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))

Perhaps we only need to check the mapcount?

IIUC, if a large folio that was PMD/PTE mapped is fully unmapped here,
then folio_mapcount() will return 0.

- if (folio_test_large(folio) && folio_test_anon(folio))
- if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
- deferred_split_folio(folio);
+ if (folio_test_large(folio) && folio_test_anon(folio) &&
+ folio_mapcount(folio))
+ deferred_split_folio(folio);

Thanks,
Lance




> + deferred_split_folio(folio);
> }
>
> /*
>
> base-commit: 2541ee5668b019c486dd3e815114130e35c1495d
> --
> 2.43.0
>

2024-04-25 07:20:29

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On 25.04.24 00:46, Zi Yan wrote:
> From: Zi Yan <[email protected]>
>
> In __folio_remove_rmap(), a large folio is added to deferred split list
> if any page in a folio loses its final mapping. It is possible that
> the folio is unmapped fully, but it is unnecessary to add the folio
> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
> before adding a folio to deferred split list. If the folio is already
> on the deferred split list, it will be skipped. This issue applies to
> both PTE-mapped THP and mTHP.
>
> Commit 98046944a159 ("mm: huge_memory: add the missing
> folio_test_pmd_mappable() for THP split statistics") tried to exclude
> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still

Once again: your patch won't fix it either.

> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
> However, this miscount was present even earlier due to implementation,
> since PTEs are unmapped individually and first PTE unmapping adds the THP
> into the deferred split list.

It will still be present. Just less frequently.

>
> With commit b06dc281aa99 ("mm/rmap: introduce
> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
> folios in one shot without causing the miscount, hence this patch.
>
> Signed-off-by: Zi Yan <[email protected]>
> Reviewed-by: Yang Shi <[email protected]>
> ---
> mm/rmap.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a7913a454028..2809348add7b 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
> * page of the folio is unmapped and at least one page
> * is still mapped.
> */
> - if (folio_test_large(folio) && folio_test_anon(folio))
> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
> - deferred_split_folio(folio);
> + if (folio_test_large(folio) && folio_test_anon(folio) &&
> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
> + deferred_split_folio(folio);

Please refrain from posting a new patch before the discussion on the old
one is done.

See my comments on v2 why optimizing out the function call is a
reasonable thing to do *where we cannot batch* and the misaccounting
will still happen. But it can be done independently.

--
Cheers,

David / dhildenb


2024-04-25 07:21:26

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On 25.04.24 05:45, Lance Yang wrote:
> Hey Zi,
>
> On Thu, Apr 25, 2024 at 6:46 AM Zi Yan <[email protected]> wrote:
>>
>> From: Zi Yan <[email protected]>
>>
>> In __folio_remove_rmap(), a large folio is added to deferred split list
>> if any page in a folio loses its final mapping. It is possible that
>> the folio is unmapped fully, but it is unnecessary to add the folio
>
> Agreed. If a folio is fully unmapped, then that's unnecessary to add
> to the deferred split list.
>
>> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
>> before adding a folio to deferred split list. If the folio is already
>> on the deferred split list, it will be skipped. This issue applies to
>> both PTE-mapped THP and mTHP.
>>
>> Commit 98046944a159 ("mm: huge_memory: add the missing
>> folio_test_pmd_mappable() for THP split statistics") tried to exclude
>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
>> However, this miscount was present even earlier due to implementation,
>> since PTEs are unmapped individually and first PTE unmapping adds the THP
>> into the deferred split list.
>>
>> With commit b06dc281aa99 ("mm/rmap: introduce
>> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
>> folios in one shot without causing the miscount, hence this patch.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> Reviewed-by: Yang Shi <[email protected]>
>> ---
>> mm/rmap.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index a7913a454028..2809348add7b 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
>> * page of the folio is unmapped and at least one page
>> * is still mapped.
>> */
>> - if (folio_test_large(folio) && folio_test_anon(folio))
>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
>> - deferred_split_folio(folio);
>> + if (folio_test_large(folio) && folio_test_anon(folio) &&
>> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
>> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
>
> Perhaps we only need to check the mapcount?
>
> IIUC, if a large folio that was PMD/PTE mapped is fully unmapped here,
> then folio_mapcount() will return 0.

See discussion on v1. folio_large_mapcount() would achieve the same
without another folio_test_large() check, but in the context of this
patch it doesn't really matter.

--
Cheers,

David / dhildenb


2024-04-25 07:29:09

by Lance Yang

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On Thu, Apr 25, 2024 at 3:21 PM David Hildenbrand <[email protected]> wrote:
>
> On 25.04.24 05:45, Lance Yang wrote:
> > Hey Zi,
> >
> > On Thu, Apr 25, 2024 at 6:46 AM Zi Yan <[email protected]> wrote:
> >>
> >> From: Zi Yan <[email protected]>
> >>
> >> In __folio_remove_rmap(), a large folio is added to deferred split list
> >> if any page in a folio loses its final mapping. It is possible that
> >> the folio is unmapped fully, but it is unnecessary to add the folio
> >
> > Agreed. If a folio is fully unmapped, then that's unnecessary to add
> > to the deferred split list.
> >
> >> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
> >> before adding a folio to deferred split list. If the folio is already
> >> on the deferred split list, it will be skipped. This issue applies to
> >> both PTE-mapped THP and mTHP.
> >>
> >> Commit 98046944a159 ("mm: huge_memory: add the missing
> >> folio_test_pmd_mappable() for THP split statistics") tried to exclude
> >> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
> >> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
> >> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
> >> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
> >> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
> >> However, this miscount was present even earlier due to implementation,
> >> since PTEs are unmapped individually and first PTE unmapping adds the THP
> >> into the deferred split list.
> >>
> >> With commit b06dc281aa99 ("mm/rmap: introduce
> >> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
> >> folios in one shot without causing the miscount, hence this patch.
> >>
> >> Signed-off-by: Zi Yan <[email protected]>
> >> Reviewed-by: Yang Shi <[email protected]>
> >> ---
> >> mm/rmap.c | 7 ++++---
> >> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index a7913a454028..2809348add7b 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
> >> * page of the folio is unmapped and at least one page
> >> * is still mapped.
> >> */
> >> - if (folio_test_large(folio) && folio_test_anon(folio))
> >> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
> >> - deferred_split_folio(folio);
> >> + if (folio_test_large(folio) && folio_test_anon(folio) &&
> >> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
> >> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
> >
> > Perhaps we only need to check the mapcount?
> >
> > IIUC, if a large folio that was PMD/PTE mapped is fully unmapped here,
> > then folio_mapcount() will return 0.
>
> See discussion on v1. folio_large_mapcount() would achieve the same
> without another folio_test_large() check, but in the context of this
> patch it doesn't really matter.

Got it. Thanks for pointing that out!
I'll take a closer look at the discussion in v1.

Thanks,
Lance


>
> --
> Cheers,
>
> David / dhildenb
>

2024-04-25 07:29:51

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On 25.04.24 09:27, Lance Yang wrote:
> On Thu, Apr 25, 2024 at 3:21 PM David Hildenbrand <[email protected]> wrote:
>>
>> On 25.04.24 05:45, Lance Yang wrote:
>>> Hey Zi,
>>>
>>> On Thu, Apr 25, 2024 at 6:46 AM Zi Yan <[email protected]> wrote:
>>>>
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> In __folio_remove_rmap(), a large folio is added to deferred split list
>>>> if any page in a folio loses its final mapping. It is possible that
>>>> the folio is unmapped fully, but it is unnecessary to add the folio
>>>
>>> Agreed. If a folio is fully unmapped, then that's unnecessary to add
>>> to the deferred split list.
>>>
>>>> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
>>>> before adding a folio to deferred split list. If the folio is already
>>>> on the deferred split list, it will be skipped. This issue applies to
>>>> both PTE-mapped THP and mTHP.
>>>>
>>>> Commit 98046944a159 ("mm: huge_memory: add the missing
>>>> folio_test_pmd_mappable() for THP split statistics") tried to exclude
>>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
>>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
>>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
>>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
>>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
>>>> However, this miscount was present even earlier due to implementation,
>>>> since PTEs are unmapped individually and first PTE unmapping adds the THP
>>>> into the deferred split list.
>>>>
>>>> With commit b06dc281aa99 ("mm/rmap: introduce
>>>> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
>>>> folios in one shot without causing the miscount, hence this patch.
>>>>
>>>> Signed-off-by: Zi Yan <[email protected]>
>>>> Reviewed-by: Yang Shi <[email protected]>
>>>> ---
>>>> mm/rmap.c | 7 ++++---
>>>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index a7913a454028..2809348add7b 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
>>>> * page of the folio is unmapped and at least one page
>>>> * is still mapped.
>>>> */
>>>> - if (folio_test_large(folio) && folio_test_anon(folio))
>>>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
>>>> - deferred_split_folio(folio);
>>>> + if (folio_test_large(folio) && folio_test_anon(folio) &&
>>>> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
>>>> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
>>>
>>> Perhaps we only need to check the mapcount?
>>>
>>> IIUC, if a large folio that was PMD/PTE mapped is fully unmapped here,
>>> then folio_mapcount() will return 0.
>>
>> See discussion on v1. folio_large_mapcount() would achieve the same
>> without another folio_test_large() check, but in the context of this
>> patch it doesn't really matter.
>
> Got it. Thanks for pointing that out!
> I'll take a closer look at the discussion in v1.

Forgot to add: as long as the large mapcount patches are not upstream,
folio_large_mapcount() would be expensive. So this patch can be added
independent of the other stuff.

--
Cheers,

David / dhildenb


2024-04-25 07:35:34

by Lance Yang

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On Thu, Apr 25, 2024 at 3:29 PM David Hildenbrand <[email protected]> wrote:
>
> On 25.04.24 09:27, Lance Yang wrote:
> > On Thu, Apr 25, 2024 at 3:21 PM David Hildenbrand <david@redhatcom> wrote:
> >>
> >> On 25.04.24 05:45, Lance Yang wrote:
> >>> Hey Zi,
> >>>
> >>> On Thu, Apr 25, 2024 at 6:46 AM Zi Yan <[email protected]> wrote:
> >>>>
> >>>> From: Zi Yan <[email protected]>
> >>>>
> >>>> In __folio_remove_rmap(), a large folio is added to deferred split list
> >>>> if any page in a folio loses its final mapping. It is possible that
> >>>> the folio is unmapped fully, but it is unnecessary to add the folio
> >>>
> >>> Agreed. If a folio is fully unmapped, then that's unnecessary to add
> >>> to the deferred split list.
> >>>
> >>>> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
> >>>> before adding a folio to deferred split list. If the folio is already
> >>>> on the deferred split list, it will be skipped. This issue applies to
> >>>> both PTE-mapped THP and mTHP.
> >>>>
> >>>> Commit 98046944a159 ("mm: huge_memory: add the missing
> >>>> folio_test_pmd_mappable() for THP split statistics") tried to exclude
> >>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
> >>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
> >>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
> >>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
> >>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
> >>>> However, this miscount was present even earlier due to implementation,
> >>>> since PTEs are unmapped individually and first PTE unmapping adds the THP
> >>>> into the deferred split list.
> >>>>
> >>>> With commit b06dc281aa99 ("mm/rmap: introduce
> >>>> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
> >>>> folios in one shot without causing the miscount, hence this patch.
> >>>>
> >>>> Signed-off-by: Zi Yan <[email protected]>
> >>>> Reviewed-by: Yang Shi <[email protected]>
> >>>> ---
> >>>> mm/rmap.c | 7 ++++---
> >>>> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/mm/rmap.c b/mm/rmap.c
> >>>> index a7913a454028..2809348add7b 100644
> >>>> --- a/mm/rmap.c
> >>>> +++ b/mm/rmap.c
> >>>> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
> >>>> * page of the folio is unmapped and at least one page
> >>>> * is still mapped.
> >>>> */
> >>>> - if (folio_test_large(folio) && folio_test_anon(folio))
> >>>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
> >>>> - deferred_split_folio(folio);
> >>>> + if (folio_test_large(folio) && folio_test_anon(folio) &&
> >>>> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
> >>>> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
> >>>
> >>> Perhaps we only need to check the mapcount?
> >>>
> >>> IIUC, if a large folio that was PMD/PTE mapped is fully unmapped here,
> >>> then folio_mapcount() will return 0.
> >>
> >> See discussion on v1. folio_large_mapcount() would achieve the same
> >> without another folio_test_large() check, but in the context of this
> >> patch it doesn't really matter.
> >
> > Got it. Thanks for pointing that out!
> > I'll take a closer look at the discussion in v1.
>
> Forgot to add: as long as the large mapcount patches are not upstream,
> folio_large_mapcount() would be expensive. So this patch can be added
> independent of the other stuff.

Thanks for clarifying!
Lance

>
> --
> Cheers,
>
> David / dhildenb
>

2024-04-25 15:15:22

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On 25.04.24 16:53, Zi Yan wrote:
> On 25 Apr 2024, at 3:19, David Hildenbrand wrote:
>
>> On 25.04.24 00:46, Zi Yan wrote:
>>> From: Zi Yan <[email protected]>
>>>
>>> In __folio_remove_rmap(), a large folio is added to deferred split list
>>> if any page in a folio loses its final mapping. It is possible that
>>> the folio is unmapped fully, but it is unnecessary to add the folio
>>> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
>>> before adding a folio to deferred split list. If the folio is already
>>> on the deferred split list, it will be skipped. This issue applies to
>>> both PTE-mapped THP and mTHP.
>>>
>>> Commit 98046944a159 ("mm: huge_memory: add the missing
>>> folio_test_pmd_mappable() for THP split statistics") tried to exclude
>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
>>
>> Once again: your patch won't fix it either.
>>
>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
>>> However, this miscount was present even earlier due to implementation,
>>> since PTEs are unmapped individually and first PTE unmapping adds the THP
>>> into the deferred split list.
>>
>> It will still be present. Just less frequently.
>
> OK. Let me reread the email exchanges between you and Yang and clarify
> the details in the commit log.

Likely something like:

--
In __folio_remove_rmap(), a large folio is added to deferred split list
if any page in a folio loses its final mapping. But, it is possible that
the folio is now fully unmapped and adding it to the deferred split list
is unnecessary.

For PMD-mapped THPs, that was not really an issue, because removing the
last PMD mapping in the absence of PTE mappings would not have added the
folio to the deferred split queue.

However, for PTE-mapped THPs, which are now more prominent due to mTHP,
we will always end up adding them to the deferred split queue.

One side effect of this is that we will frequently increase the
THP_DEFERRED_SPLIT_PAGE stat for PTE-mapped THP, making it look like we
frequently get many partially mapped folios -- although we are simply
unmapping the whole thing stepwise.

Core-mm will now try batch-unmapping consecutive PTEs of PTE-mapped THPs
where possible. If we're lucky, we unmap the whole thing in one go and
can avoid adding the folio to the deferred split queue, reducing the
THP_DEFERRED_SPLIT_PAGE noise.

But there will still be noise when we cannot batch-unmap a complete
PTE-mapped folio in one go -- or where this type of batching is not
implemented yet.
--

Feel free to reuse what you consider reasonable.

--
Cheers,

David / dhildenb


2024-04-25 15:17:07

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On 25 Apr 2024, at 11:15, David Hildenbrand wrote:

> On 25.04.24 16:53, Zi Yan wrote:
>> On 25 Apr 2024, at 3:19, David Hildenbrand wrote:
>>
>>> On 25.04.24 00:46, Zi Yan wrote:
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> In __folio_remove_rmap(), a large folio is added to deferred split list
>>>> if any page in a folio loses its final mapping. It is possible that
>>>> the folio is unmapped fully, but it is unnecessary to add the folio
>>>> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
>>>> before adding a folio to deferred split list. If the folio is already
>>>> on the deferred split list, it will be skipped. This issue applies to
>>>> both PTE-mapped THP and mTHP.
>>>>
>>>> Commit 98046944a159 ("mm: huge_memory: add the missing
>>>> folio_test_pmd_mappable() for THP split statistics") tried to exclude
>>>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
>>>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
>>>
>>> Once again: your patch won't fix it either.
>>>
>>>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
>>>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
>>>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
>>>> However, this miscount was present even earlier due to implementation,
>>>> since PTEs are unmapped individually and first PTE unmapping adds the THP
>>>> into the deferred split list.
>>>
>>> It will still be present. Just less frequently.
>>
>> OK. Let me reread the email exchanges between you and Yang and clarify
>> the details in the commit log.
>
> Likely something like:
>
> --
> In __folio_remove_rmap(), a large folio is added to deferred split list
> if any page in a folio loses its final mapping. But, it is possible that
> the folio is now fully unmapped and adding it to the deferred split list is unnecessary.
>
> For PMD-mapped THPs, that was not really an issue, because removing the last PMD mapping in the absence of PTE mappings would not have added the folio to the deferred split queue.
>
> However, for PTE-mapped THPs, which are now more prominent due to mTHP, we will always end up adding them to the deferred split queue.
>
> One side effect of this is that we will frequently increase the THP_DEFERRED_SPLIT_PAGE stat for PTE-mapped THP, making it look like we frequently get many partially mapped folios -- although we are simply
> unmapping the whole thing stepwise.
>
> Core-mm will now try batch-unmapping consecutive PTEs of PTE-mapped THPs where possible. If we're lucky, we unmap the whole thing in one go and can avoid adding the folio to the deferred split queue, reducing the THP_DEFERRED_SPLIT_PAGE noise.
>
> But there will still be noise when we cannot batch-unmap a complete PTE-mapped folio in one go -- or where this type of batching is not implemented yet.
> --
>
> Feel free to reuse what you consider reasonable.

Sure. Thank you a lot for drafting it!

--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-04-25 15:23:01

by Zi Yan

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On 25 Apr 2024, at 3:19, David Hildenbrand wrote:

> On 25.04.24 00:46, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> In __folio_remove_rmap(), a large folio is added to deferred split list
>> if any page in a folio loses its final mapping. It is possible that
>> the folio is unmapped fully, but it is unnecessary to add the folio
>> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
>> before adding a folio to deferred split list. If the folio is already
>> on the deferred split list, it will be skipped. This issue applies to
>> both PTE-mapped THP and mTHP.
>>
>> Commit 98046944a159 ("mm: huge_memory: add the missing
>> folio_test_pmd_mappable() for THP split statistics") tried to exclude
>> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
>> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
>
> Once again: your patch won't fix it either.
>
>> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
>> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
>> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
>> However, this miscount was present even earlier due to implementation,
>> since PTEs are unmapped individually and first PTE unmapping adds the THP
>> into the deferred split list.
>
> It will still be present. Just less frequently.

OK. Let me reread the email exchanges between you and Yang and clarify
the details in the commit log.

>
>>
>> With commit b06dc281aa99 ("mm/rmap: introduce
>> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
>> folios in one shot without causing the miscount, hence this patch.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> Reviewed-by: Yang Shi <[email protected]>
>> ---
>> mm/rmap.c | 7 ++++---
>> 1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index a7913a454028..2809348add7b 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
>> * page of the folio is unmapped and at least one page
>> * is still mapped.
>> */
>> - if (folio_test_large(folio) && folio_test_anon(folio))
>> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
>> - deferred_split_folio(folio);
>> + if (folio_test_large(folio) && folio_test_anon(folio) &&
>> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
>> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
>> + deferred_split_folio(folio);
>
> Please refrain from posting a new patch before the discussion on the old one is done.
>
> See my comments on v2 why optimizing out the function call is a reasonable thing to do *where we cannot batch* and the misaccounting will still happen. But it can be done independently.

Got it. Will keep the deferred list checking here and send a new one with commit
log changes too.

Thank you for the reviews.


--
Best Regards,
Yan, Zi


Attachments:
signature.asc (871.00 B)
OpenPGP digital signature

2024-04-25 15:49:52

by Yang Shi

[permalink] [raw]
Subject: Re: [PATCH v3] mm/rmap: do not add fully unmapped large folio to deferred split list

On Thu, Apr 25, 2024 at 7:53 AM Zi Yan <[email protected]> wrote:
>
> On 25 Apr 2024, at 3:19, David Hildenbrand wrote:
>
> > On 25.04.24 00:46, Zi Yan wrote:
> >> From: Zi Yan <[email protected]>
> >>
> >> In __folio_remove_rmap(), a large folio is added to deferred split list
> >> if any page in a folio loses its final mapping. It is possible that
> >> the folio is unmapped fully, but it is unnecessary to add the folio
> >> to deferred split list at all. Fix it by checking folio->_nr_pages_mapped
> >> before adding a folio to deferred split list. If the folio is already
> >> on the deferred split list, it will be skipped. This issue applies to
> >> both PTE-mapped THP and mTHP.
> >>
> >> Commit 98046944a159 ("mm: huge_memory: add the missing
> >> folio_test_pmd_mappable() for THP split statistics") tried to exclude
> >> mTHP deferred split stats from THP_DEFERRED_SPLIT_PAGE, but it does not
> >> fix the above issue. A fully unmapped PTE-mapped order-9 THP was still
> >
> > Once again: your patch won't fix it either.
> >
> >> added to deferred split list and counted as THP_DEFERRED_SPLIT_PAGE,
> >> since nr is 512 (non zero), level is RMAP_LEVEL_PTE, and inside
> >> deferred_split_folio() the order-9 folio is folio_test_pmd_mappable().
> >> However, this miscount was present even earlier due to implementation,
> >> since PTEs are unmapped individually and first PTE unmapping adds the THP
> >> into the deferred split list.
> >
> > It will still be present. Just less frequently.
>
> OK. Let me reread the email exchanges between you and Yang and clarify
> the details in the commit log.

There are still some places which may unmap PTE-mapped THP in page
granularity, for example, migration.

>
> >
> >>
> >> With commit b06dc281aa99 ("mm/rmap: introduce
> >> folio_remove_rmap_[pte|ptes|pmd]()"), kernel is able to unmap PTE-mapped
> >> folios in one shot without causing the miscount, hence this patch.
> >>
> >> Signed-off-by: Zi Yan <[email protected]>
> >> Reviewed-by: Yang Shi <[email protected]>
> >> ---
> >> mm/rmap.c | 7 ++++---
> >> 1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index a7913a454028..2809348add7b 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -1553,9 +1553,10 @@ static __always_inline void __folio_remove_rmap(struct folio *folio,
> >> * page of the folio is unmapped and at least one page
> >> * is still mapped.
> >> */
> >> - if (folio_test_large(folio) && folio_test_anon(folio))
> >> - if (level == RMAP_LEVEL_PTE || nr < nr_pmdmapped)
> >> - deferred_split_folio(folio);
> >> + if (folio_test_large(folio) && folio_test_anon(folio) &&
> >> + ((level == RMAP_LEVEL_PTE && atomic_read(mapped)) ||
> >> + (level == RMAP_LEVEL_PMD && nr < nr_pmdmapped)))
> >> + deferred_split_folio(folio);
> >
> > Please refrain from posting a new patch before the discussion on the old one is done.
> >
> > See my comments on v2 why optimizing out the function call is a reasonable thing to do *where we cannot batch* and the misaccounting will still happen. But it can be done independently.
>
> Got it. Will keep the deferred list checking here and send a new one with commit
> log changes too.
>
> Thank you for the reviews.
>
>
> --
> Best Regards,
> Yan, Zi