2023-05-25 20:19:38

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On Thu, May 25, 2023 at 01:15:07PM -0600, Khalid Aziz wrote:
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..b548e05f0349 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -764,6 +764,42 @@ static bool too_many_isolated(pg_data_t *pgdat)
> return too_many;
> }
>
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it has extra refcounts that will prevent it from being migrated.
> + * This code is inspired by similar code in migrate_vma_check_page(),
> + * can_split_folio() and folio_migrate_mapping()
> + */
> +static inline bool page_has_extra_refs(struct page *page,
> + struct address_space *mapping)
> +{
> + unsigned long extra_refs;
> + struct folio *folio;
> +
> + /*
> + * Skip this check for pages in ZONE_MOVABLE or MIGRATE_CMA
> + * pages that can not be long term pinned
> + */
> + if (is_zone_movable_page(page) || is_migrate_cma_page(page))
> + return false;
> +
> + folio = page_folio(page);
> +
> + /*
> + * caller holds a ref already from get_page_unless_zero()
> + * which is accounted for in folio_expected_refs()
> + */
> + extra_refs = folio_expected_refs(mapping, folio);
> +
> + /*
> + * This is an admittedly racy check but good enough to determine
> + * if a page is pinned and can not be migrated
> + */
> + if ((folio_ref_count(folio) - extra_refs) > folio_mapcount(folio))
> + return true;
> + return false;
> +}
> +
> /**
> * isolate_migratepages_block() - isolate all migrate-able pages within
> * a single pageblock
> @@ -992,12 +1028,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> goto isolate_fail;

Just out of shot, we have ...

if (unlikely(!get_page_unless_zero(page)))

This is the perfect opportunity to use folio_get_nontail_page() instead.
You get back the folio without having to cast the pointer yourself
or call page_folio(). Now you can use a folio throughout your new
function, saving a call to compound_head().

For a followup patch, everything in this loop below this point can use
the folio ... that's quite a lot of change.

> /*
> - * Migration will fail if an anonymous page is pinned in memory,
> - * so avoid taking lru_lock and isolating it unnecessarily in an
> - * admittedly racy check.
> + * Migration will fail if a page has extra refcounts
> + * from long term pinning preventing it from migrating,
> + * so avoid taking lru_lock and isolating it unnecessarily.
> */

Isn't "long term pinning" the wrong description of the problem? Long term
pins suggest to me FOLL_LONGTERM. I think this is simple short term
pins that we care about here.



2023-05-25 20:42:19

by Steven Sistare

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On 5/25/2023 3:58 PM, Matthew Wilcox wrote:
> On Thu, May 25, 2023 at 01:15:07PM -0600, Khalid Aziz wrote:
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..b548e05f0349 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -764,6 +764,42 @@ static bool too_many_isolated(pg_data_t *pgdat)
>> return too_many;
>> }
>>
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it has extra refcounts that will prevent it from being migrated.
>> + * This code is inspired by similar code in migrate_vma_check_page(),
>> + * can_split_folio() and folio_migrate_mapping()
>> + */
>> +static inline bool page_has_extra_refs(struct page *page,
>> + struct address_space *mapping)
>> +{
>> + unsigned long extra_refs;
>> + struct folio *folio;
>> +
>> + /*
>> + * Skip this check for pages in ZONE_MOVABLE or MIGRATE_CMA
>> + * pages that can not be long term pinned
>> + */
>> + if (is_zone_movable_page(page) || is_migrate_cma_page(page))
>> + return false;
>> +
>> + folio = page_folio(page);
>> +
>> + /*
>> + * caller holds a ref already from get_page_unless_zero()
>> + * which is accounted for in folio_expected_refs()
>> + */
>> + extra_refs = folio_expected_refs(mapping, folio);
>> +
>> + /*
>> + * This is an admittedly racy check but good enough to determine
>> + * if a page is pinned and can not be migrated
>> + */
>> + if ((folio_ref_count(folio) - extra_refs) > folio_mapcount(folio))
>> + return true;
>> + return false;
>> +}
>> +
>> /**
>> * isolate_migratepages_block() - isolate all migrate-able pages within
>> * a single pageblock
>> @@ -992,12 +1028,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> goto isolate_fail;
>
> Just out of shot, we have ...
>
> if (unlikely(!get_page_unless_zero(page)))
>
> This is the perfect opportunity to use folio_get_nontail_page() instead.
> You get back the folio without having to cast the pointer yourself
> or call page_folio(). Now you can use a folio throughout your new
> function, saving a call to compound_head().
>
> For a followup patch, everything in this loop below this point can use
> the folio ... that's quite a lot of change.
>
>> /*
>> - * Migration will fail if an anonymous page is pinned in memory,
>> - * so avoid taking lru_lock and isolating it unnecessarily in an
>> - * admittedly racy check.
>> + * Migration will fail if a page has extra refcounts
>> + * from long term pinning preventing it from migrating,
>> + * so avoid taking lru_lock and isolating it unnecessarily.
>> */
>
> Isn't "long term pinning" the wrong description of the problem? Long term
> pins suggest to me FOLL_LONGTERM. I think this is simple short term
> pins that we care about here.

vfio pins are held for a long time - Steve

2023-05-25 20:51:09

by Khalid Aziz

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On 5/25/23 13:58, Matthew Wilcox wrote:
> On Thu, May 25, 2023 at 01:15:07PM -0600, Khalid Aziz wrote:
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 5a9501e0ae01..b548e05f0349 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -764,6 +764,42 @@ static bool too_many_isolated(pg_data_t *pgdat)
>> return too_many;
>> }
>>
>> +/*
>> + * Check if this base page should be skipped from isolation because
>> + * it has extra refcounts that will prevent it from being migrated.
>> + * This code is inspired by similar code in migrate_vma_check_page(),
>> + * can_split_folio() and folio_migrate_mapping()
>> + */
>> +static inline bool page_has_extra_refs(struct page *page,
>> + struct address_space *mapping)
>> +{
>> + unsigned long extra_refs;
>> + struct folio *folio;
>> +
>> + /*
>> + * Skip this check for pages in ZONE_MOVABLE or MIGRATE_CMA
>> + * pages that can not be long term pinned
>> + */
>> + if (is_zone_movable_page(page) || is_migrate_cma_page(page))
>> + return false;
>> +
>> + folio = page_folio(page);
>> +
>> + /*
>> + * caller holds a ref already from get_page_unless_zero()
>> + * which is accounted for in folio_expected_refs()
>> + */
>> + extra_refs = folio_expected_refs(mapping, folio);
>> +
>> + /*
>> + * This is an admittedly racy check but good enough to determine
>> + * if a page is pinned and can not be migrated
>> + */
>> + if ((folio_ref_count(folio) - extra_refs) > folio_mapcount(folio))
>> + return true;
>> + return false;
>> +}
>> +
>> /**
>> * isolate_migratepages_block() - isolate all migrate-able pages within
>> * a single pageblock
>> @@ -992,12 +1028,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>> goto isolate_fail;
>
> Just out of shot, we have ...
>
> if (unlikely(!get_page_unless_zero(page)))
>
> This is the perfect opportunity to use folio_get_nontail_page() instead.
> You get back the folio without having to cast the pointer yourself
> or call page_folio(). Now you can use a folio throughout your new
> function, saving a call to compound_head().
>
> For a followup patch, everything in this loop below this point can use
> the folio ... that's quite a lot of change.

Can that all be in a separate patch by itself? I tried to keep all folio functions contained inside
page_has_extra_refs(). If we change part of isolate_migratepages_block() to folio, it would make sense to change rest of
the function at the same time.

>
>> /*
>> - * Migration will fail if an anonymous page is pinned in memory,
>> - * so avoid taking lru_lock and isolating it unnecessarily in an
>> - * admittedly racy check.
>> + * Migration will fail if a page has extra refcounts
>> + * from long term pinning preventing it from migrating,
>> + * so avoid taking lru_lock and isolating it unnecessarily.
>> */
>
> Isn't "long term pinning" the wrong description of the problem? Long term
> pins suggest to me FOLL_LONGTERM. I think this is simple short term
> pins that we care about here.
>

As Steve pointed out, vfio pinned pages are long term and we are concerned about long term pinned pages since no matter
how many times we go over them, they will not migrate.

Thanks,
Khalid


2023-05-25 21:01:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On Thu, May 25, 2023 at 04:15:07PM -0400, Steven Sistare wrote:
> On 5/25/2023 3:58 PM, Matthew Wilcox wrote:
> > On Thu, May 25, 2023 at 01:15:07PM -0600, Khalid Aziz wrote:
> >> diff --git a/mm/compaction.c b/mm/compaction.c
> >> index 5a9501e0ae01..b548e05f0349 100644
> >> --- a/mm/compaction.c
> >> +++ b/mm/compaction.c
> >> @@ -764,6 +764,42 @@ static bool too_many_isolated(pg_data_t *pgdat)
> >> return too_many;
> >> }
> >>
> >> +/*
> >> + * Check if this base page should be skipped from isolation because
> >> + * it has extra refcounts that will prevent it from being migrated.
> >> + * This code is inspired by similar code in migrate_vma_check_page(),
> >> + * can_split_folio() and folio_migrate_mapping()
> >> + */
> >> +static inline bool page_has_extra_refs(struct page *page,
> >> + struct address_space *mapping)
> >> +{
> >> + unsigned long extra_refs;
> >> + struct folio *folio;
> >> +
> >> + /*
> >> + * Skip this check for pages in ZONE_MOVABLE or MIGRATE_CMA
> >> + * pages that can not be long term pinned
> >> + */
> >> + if (is_zone_movable_page(page) || is_migrate_cma_page(page))
> >> + return false;
> >> +
> >> + folio = page_folio(page);
> >> +
> >> + /*
> >> + * caller holds a ref already from get_page_unless_zero()
> >> + * which is accounted for in folio_expected_refs()
> >> + */
> >> + extra_refs = folio_expected_refs(mapping, folio);
> >> +
> >> + /*
> >> + * This is an admittedly racy check but good enough to determine
> >> + * if a page is pinned and can not be migrated
> >> + */
> >> + if ((folio_ref_count(folio) - extra_refs) > folio_mapcount(folio))
> >> + return true;
> >> + return false;
> >> +}
> >> +
> >> /**
> >> * isolate_migratepages_block() - isolate all migrate-able pages within
> >> * a single pageblock
> >> @@ -992,12 +1028,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
> >> goto isolate_fail;
> >
> > Just out of shot, we have ...
> >
> > if (unlikely(!get_page_unless_zero(page)))
> >
> > This is the perfect opportunity to use folio_get_nontail_page() instead.
> > You get back the folio without having to cast the pointer yourself
> > or call page_folio(). Now you can use a folio throughout your new
> > function, saving a call to compound_head().
> >
> > For a followup patch, everything in this loop below this point can use
> > the folio ... that's quite a lot of change.
> >
> >> /*
> >> - * Migration will fail if an anonymous page is pinned in memory,
> >> - * so avoid taking lru_lock and isolating it unnecessarily in an
> >> - * admittedly racy check.
> >> + * Migration will fail if a page has extra refcounts
> >> + * from long term pinning preventing it from migrating,
> >> + * so avoid taking lru_lock and isolating it unnecessarily.
> >> */
> >
> > Isn't "long term pinning" the wrong description of the problem? Long term
> > pins suggest to me FOLL_LONGTERM. I think this is simple short term
> > pins that we care about here.
>
> vfio pins are held for a long time - Steve

So this is a third sense of "pinned pages" that is neither what
filesystems nor the mm means by pinned pages, but whatever it is that
vfio means by pinned pages? If only "pin" weren't such a desirable
word. Can somebody explain to me in small words what a vfio pin looks
like because I've tried reading vfio_iommu_type1_pin_pages() and I
don't recognise anything there that looks like pinning in either of
the other two senses.


2023-05-25 21:37:15

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On Thu, May 25, 2023 at 09:45:34PM +0100, Matthew Wilcox wrote:
> > > Isn't "long term pinning" the wrong description of the problem? Long term
> > > pins suggest to me FOLL_LONGTERM. I think this is simple short term
> > > pins that we care about here.
> >
> > vfio pins are held for a long time - Steve
>
> So this is a third sense of "pinned pages" that is neither what
> filesystems nor the mm means by pinned pages, but whatever it is that
> vfio means by pinned pages? If only "pin" weren't such a desirable
> word. Can somebody explain to me in small words what a vfio pin looks
> like because I've tried reading vfio_iommu_type1_pin_pages() and I
> don't recognise anything there that looks like pinning in either of
> the other two senses.

Oh, I think I found it! pin_user_pages_remote() is called by
vaddr_get_pfns(). If these are the pages you're concerned about,
then the efficient way to do what you want is simply to call
folio_maybe_dma_pinned(). Far more efficient than the current mess
of total_mapcount().

2023-05-26 16:29:05

by Khalid Aziz

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On 5/25/23 15:31, Matthew Wilcox wrote:
> On Thu, May 25, 2023 at 09:45:34PM +0100, Matthew Wilcox wrote:
>>>> Isn't "long term pinning" the wrong description of the problem? Long term
>>>> pins suggest to me FOLL_LONGTERM. I think this is simple short term
>>>> pins that we care about here.
>>>
>>> vfio pins are held for a long time - Steve
>>
>> So this is a third sense of "pinned pages" that is neither what
>> filesystems nor the mm means by pinned pages, but whatever it is that
>> vfio means by pinned pages? If only "pin" weren't such a desirable
>> word. Can somebody explain to me in small words what a vfio pin looks
>> like because I've tried reading vfio_iommu_type1_pin_pages() and I
>> don't recognise anything there that looks like pinning in either of
>> the other two senses.
>
> Oh, I think I found it! pin_user_pages_remote() is called by
> vaddr_get_pfns(). If these are the pages you're concerned about,
> then the efficient way to do what you want is simply to call
> folio_maybe_dma_pinned(). Far more efficient than the current mess
> of total_mapcount().

vfio pinned pages triggered this change. Wouldn't checking refcounts against mapcount provide a more generalized way of
detecting non-migratable pages?

Thanks,
Khalid

2023-05-26 17:00:34

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan

On Fri, May 26, 2023 at 09:44:34AM -0600, Khalid Aziz wrote:
> > Oh, I think I found it! pin_user_pages_remote() is called by
> > vaddr_get_pfns(). If these are the pages you're concerned about,
> > then the efficient way to do what you want is simply to call
> > folio_maybe_dma_pinned(). Far more efficient than the current mess
> > of total_mapcount().
>
> vfio pinned pages triggered this change. Wouldn't checking refcounts against
> mapcount provide a more generalized way of detecting non-migratable pages?

Well, you changed the comment to say that we were concerned about
long-term pins. If we are, than folio_maybe_dma_pinned() is how to test
for long-term pins. If we want to skip pages which are short-term pinned,
then we need to not change the comment, and keep using mapcount/refcount
differences.