by Yin, Fengwei

[permalink] [raw]

On 18/12/2023 17:06, David Hildenbrand wrote:
> On 18.12.23 17:07, Ryan Roberts wrote:
>> On 11/12/2023 15:56, David Hildenbrand wrote:
>>> Let's factor it out to prepare for reuse as we convert
>>> page_add_anon_rmap() to folio_add_anon_rmap_[pte|ptes|pmd]().
>>>
>>> Make the compiler always special-case on the granularity by using
>>> __always_inline.
>>>
>>> Reviewed-by: Yin Fengwei <[email protected]>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>> ---
>>> mm/rmap.c | 81 ++++++++++++++++++++++++++++++-------------------------
>>> 1 file changed, 45 insertions(+), 36 deletions(-)
>>>
>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>> index 2ff2f11275e5..c5761986a411 100644
>>> --- a/mm/rmap.c
>>> +++ b/mm/rmap.c
>>> @@ -1157,6 +1157,49 @@ int folio_total_mapcount(struct folio *folio)
>>>       return mapcount;
>>> }
>>> +static __always_inline unsigned int __folio_add_rmap(struct folio *folio,
>>> +        struct page *page, int nr_pages, enum rmap_mode mode,
>>> +        unsigned int *nr_pmdmapped)
>>> +{
>>> +    atomic_t *mapped = &folio->_nr_pages_mapped;
>>> +    int first, nr = 0;
>>> +
>>> +    __folio_rmap_sanity_checks(folio, page, nr_pages, mode);
>>> +
>>> +    /* Is page being mapped by PTE? Is this its first map to be added? */
>>
>> I suspect this comment is left over from the old version? It sounds a bit odd in
>> its new context.
>
> In this patch, I'm just moving the code, so it would have to be dropped in a
> previous patch.
>
> I'm happy to drop all these comments in previous patches.

Well it doesn't really mean much to me in this new context, so I would reword if
there is still something you need to convey to the reader, else just remove.

>
>>
>>> +    switch (mode) {
>>> +    case RMAP_MODE_PTE:
>>> +        do {
>>> +            first = atomic_inc_and_test(&page->_mapcount);
>>> +            if (first && folio_test_large(folio)) {
>>> +                first = atomic_inc_return_relaxed(mapped);
>>> +                first = (first < COMPOUND_MAPPED);
>>> +            }
>>> +
>>> +            if (first)
>>> +                nr++;
>>> +        } while (page++, --nr_pages > 0);
>>> +        break;
>>> +    case RMAP_MODE_PMD:
>>> +        first = atomic_inc_and_test(&folio->_entire_mapcount);
>>> +        if (first) {
>>> +            nr = atomic_add_return_relaxed(COMPOUND_MAPPED, mapped);
>>> +            if (likely(nr < COMPOUND_MAPPED + COMPOUND_MAPPED)) {
>>> +                *nr_pmdmapped = folio_nr_pages(folio);
>>> +                nr = *nr_pmdmapped - (nr & FOLIO_PAGES_MAPPED);
>>> +                /* Raced ahead of a remove and another add? */
>>> +                if (unlikely(nr < 0))
>>> +                    nr = 0;
>>> +            } else {
>>> +                /* Raced ahead of a remove of COMPOUND_MAPPED */
>>> +                nr = 0;
>>> +            }
>>> +        }
>>> +        break;
>>> +    }
>>> +    return nr;
>>> +}
>>> +
>>> /**
>>>    * folio_move_anon_rmap - move a folio to our anon_vma
>>>    * @folio:    The folio to move to our anon_vma
>>> @@ -1380,45 +1423,11 @@ static __always_inline void
>>> __folio_add_file_rmap(struct folio *folio,
>>>           struct page *page, int nr_pages, struct vm_area_struct *vma,
>>>           enum rmap_mode mode)
>>> {
>>> -    atomic_t *mapped = &folio->_nr_pages_mapped;
>>> -    unsigned int nr_pmdmapped = 0, first;
>>> -    int nr = 0;
>>> +    unsigned int nr, nr_pmdmapped = 0;
>>
>> You're still being inconsistent with signed/unsigned here. Is there a reason
>> these can't be signed like nr_pages in the interface?
>
> I can turn them into signed values.
>
> Personally, I think it's misleading to use "signed" for values that have
> absolutely no meaning for negative meaning. But sure, we can be consistent, at
> least in rmap code.
>

Well it's an easy way to detect overflow? But I know what you mean. There are
lots of other APIs that accept signed/unsigned 32/64 bits; It's a mess. It would
be a tiny step in the right direction if a series could at least be consistent
with itself though, IMHO. :)

2023-12-19 08:42:55

by Ryan Roberts

[permalink] [raw]

Subject: Re: [PATCH v1 15/39] mm/huge_memory: batch rmap operations in __split_huge_pmd_locked()

On 18/12/2023 17:03, David Hildenbrand wrote:
> On 18.12.23 17:22, Ryan Roberts wrote:
>> On 11/12/2023 15:56, David Hildenbrand wrote:
>>> Let's use folio_add_anon_rmap_ptes(), batching the rmap operations.
>>>
>>> While at it, use more folio operations (but only in the code branch we're
>>> touching), use VM_WARN_ON_FOLIO(), and pass RMAP_EXCLUSIVE instead of
>>> manually setting PageAnonExclusive.
>>>
>>> We should never see non-anon pages on that branch: otherwise, the
>>> existing page_add_anon_rmap() call would have been flawed already.
>>>
>>> Signed-off-by: David Hildenbrand <[email protected]>
>>> ---
>>> mm/huge_memory.c | 23 +++++++++++++++--------
>>> 1 file changed, 15 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>> index 1f5634b2f374..82ad68fe0d12 100644
>>> --- a/mm/huge_memory.c
>>> +++ b/mm/huge_memory.c
>>> @@ -2398,6 +2398,7 @@ static void __split_huge_pmd_locked(struct
>>> vm_area_struct *vma, pmd_t *pmd,
>>>           unsigned long haddr, bool freeze)
>>> {
>>>       struct mm_struct *mm = vma->vm_mm;
>>> +    struct folio *folio;
>>>       struct page *page;
>>>       pgtable_t pgtable;
>>>       pmd_t old_pmd, _pmd;
>>> @@ -2493,16 +2494,18 @@ static void __split_huge_pmd_locked(struct
>>> vm_area_struct *vma, pmd_t *pmd,
>>>           uffd_wp = pmd_swp_uffd_wp(old_pmd);
>>>       } else {
>>>           page = pmd_page(old_pmd);
>>> +        folio = page_folio(page);
>>>           if (pmd_dirty(old_pmd)) {
>>>               dirty = true;
>>> -            SetPageDirty(page);
>>> +            folio_set_dirty(folio);
>>>           }
>>>           write = pmd_write(old_pmd);
>>>           young = pmd_young(old_pmd);
>>>           soft_dirty = pmd_soft_dirty(old_pmd);
>>>           uffd_wp = pmd_uffd_wp(old_pmd);
>>> -        VM_BUG_ON_PAGE(!page_count(page), page);
>>> +        VM_WARN_ON_FOLIO(!folio_ref_count(folio), folio);
>>> +        VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
>>
>> Is this warning really correct? file-backed memory can be PMD-mapped with
>> CONFIG_READ_ONLY_THP_FOR_FS, so presumably it can also have the need to be
>> remapped as pte? Although I guess if we did have a file-backed folio, it
>> definitely wouldn't be correct to call page_add_anon_rmap() /
>> folio_add_anon_rmap_ptes()...
>
> Yes, see the patch description where I spell that out.

Oh god, how did I miss that... sorry!

>
> PTE-remapping a file-back folio will simply zap the PMD and refault from the
> page cache after creating a page table.

Yep, that makes sense.

>
> So this is anon-only code.
>