by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 2/4] mm/compaction: optimize >0 order folio compaction with free page split.

On 18 Sep 2023, at 3:34, Baolin Wang wrote:

> On 9/13/2023 12:28 AM, Zi Yan wrote:
>> From: Zi Yan <[email protected]>
>>
>> During migration in a memory compaction, free pages are placed in an array
>> of page lists based on their order. But the desired free page order (i.e.,
>> the order of a source page) might not be always present, thus leading to
>> migration failures. Split a high order free pages when source migration
>> page has a lower order to increase migration successful rate.
>>
>> Note: merging free pages when a migration fails and a lower order free
>> page is returned via compaction_free() is possible, but there is too much
>> work. Since the free pages are not buddy pages, it is hard to identify
>> these free pages using existing PFN-based page merging algorithm.
>>
>> Signed-off-by: Zi Yan <[email protected]>
>> ---
>> mm/compaction.c | 40 +++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 868e92e55d27..45747ab5f380 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1801,9 +1801,46 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>> struct compact_control *cc = (struct compact_control *)data;
>> struct folio *dst;
>> int order = folio_order(src);
>> + bool has_isolated_pages = false;
>> +again:
>> if (!cc->freepages[order].nr_free) {
>> - isolate_freepages(cc);
>> + int i;
>> +
>> + for (i = order + 1; i <= MAX_ORDER; i++) {
>> + if (cc->freepages[i].nr_free) {
>> + struct page *freepage =
>> + list_first_entry(&cc->freepages[i].pages,
>> + struct page, lru);
>> +
>> + int start_order = i;
>> + unsigned long size = 1 << start_order;
>> +
>> + list_del(&freepage->lru);
>> + cc->freepages[i].nr_free--;
>> +
>> + while (start_order > order) {
>> + start_order--;
>> + size >>= 1;
>> +
>> + list_add(&freepage[size].lru,
>> + &cc->freepages[start_order].pages);
>> + cc->freepages[start_order].nr_free++;
>> + set_page_private(&freepage[size], start_order);
>
> IIUC, these split pages should also call functions to initialize? e.g. prep_compound_page()?

Not at this place. It is done right below and above "done" label. When free pages
are on cc->freepages, we want to keep them without being post_alloc_hook() or
prep_compound_page() processed for a possible future split. A free page is
only initialized when it is returned by compaction_alloc().

>
>> + }
>> + post_alloc_hook(freepage, order, __GFP_MOVABLE);
>> + if (order)
>> + prep_compound_page(freepage, order);
>> + dst = page_folio(freepage);
>> + goto done;
>> + }
>> + }
>> + if (!has_isolated_pages) {
>> + isolate_freepages(cc);
>> + has_isolated_pages = true;
>> + goto again;
>> + }
>> +
>> if (!cc->freepages[order].nr_free)
>> return NULL;
>> }
>> @@ -1814,6 +1851,7 @@ static struct folio *compaction_alloc(struct folio *src, unsigned long data)
>> post_alloc_hook(&dst->page, order, __GFP_MOVABLE);
>> if (order)
>> prep_compound_page(&dst->page, order);
>> +done:
>> cc->nr_freepages -= 1 << order;
>> return dst;
>> }

--
Best Regards,
Yan, Zi

Attachments:

signature.asc (871.00 B)
OpenPGP digital signature

2023-09-18 22:37:52

by Zi Yan

[permalink] [raw]

Subject: Re: [RFC PATCH 0/4] Enable >0 order folio memory compaction

On 10 Oct 2023, at 2:08, Huang, Ying wrote:

> Something wrong with my mail box. Sorry, if you received duplicated
> mail.
>
> Zi Yan <[email protected]> writes:
>
>> On 9 Oct 2023, at 3:12, Huang, Ying wrote:
>>
>>> Hi, Zi,
>>>
>>> Thanks for your patch!
>>>
>>> Zi Yan <[email protected]> writes:
>>>
>>>> From: Zi Yan <[email protected]>
>>>>
>>>> Hi all,
>>>>
>>>> This patchset enables >0 order folio memory compaction, which is one of
>>>> the prerequisitions for large folio support[1]. It is on top of
>>>> mm-everything-2023-09-11-22-56.
>>>>
>>>> Overview
>>>> ===
>>>>
>>>> To support >0 order folio compaction, the patchset changes how free pages used
>>>> for migration are kept during compaction.
>>>
>>> migrate_pages() can split the large folio for allocation failure. So
>>> the minimal implementation could be
>>>
>>> - allow to migrate large folios in compaction
>>> - return -ENOMEM for order > 0 in compaction_alloc()
>>>
>>> The performance may be not desirable. But that may be a baseline for
>>> further optimization.
>>
>> I would imagine it might cause a regression since compaction might gradually
>> split high order folios in the system.
>
> I may not call it a pure regression, since large folio can be migrated
> during compaction with that, but it's possible that this hurts
> performance.
>
> Anyway, this can be a not-so-good minimal baseline.
>
>> But I can move Patch 4 first to make this the baseline and see how
>> system performance changes.
>
> Thanks!
>
>>>
>>> And, if we can measure the performance for each step of optimization,
>>> that will be even better.
>>
>> Do you have any benchmark in mind for the performance tests? vm-scalability?
>
> I remember Mel Gorman has done some tests for defragmentation before.
> But that's for order-0 pages.

OK, I will try to find that.

>
>>>> Free pages used to be split into
>>>> order-0 pages that are post allocation processed (i.e., PageBuddy flag cleared,
>>>> page order stored in page->private is zeroed, and page reference is set to 1).
>>>> Now all free pages are kept in a MAX_ORDER+1 array of page lists based
>>>> on their order without post allocation process. When migrate_pages() asks for
>>>> a new page, one of the free pages, based on the requested page order, is
>>>> then processed and given out.
>>>>
>>>>
>>>> Optimizations
>>>> ===
>>>>
>>>> 1. Free page split is added to increase migration success rate in case
>>>> a source page does not have a matched free page in the free page lists.
>>>> Free page merge is possible but not implemented, since existing
>>>> PFN-based buddy page merge algorithm requires the identification of
>>>> buddy pages, but free pages kept for memory compaction cannot have
>>>> PageBuddy set to avoid confusing other PFN scanners.
>>>>
>>>> 2. Sort source pages in ascending order before migration is added to
>>>
>>> Trivial.
>>>
>>> s/ascending/descending/
>>>
>>>> reduce free page split. Otherwise, high order free pages might be
>>>> prematurely split, causing undesired high order folio migration failures.
>>>>
>>>>
>>>> TODOs
>>>> ===
>>>>
>>>> 1. Refactor free page post allocation and free page preparation code so
>>>> that compaction_alloc() and compaction_free() can call functions instead
>>>> of hard coding.
>>>>
>>>> 2. One possible optimization is to allow migrate_pages() to continue
>>>> even if get_new_folio() returns a NULL. In general, that means there is
>>>> not enough memory. But in >0 order folio compaction case, that means
>>>> there is no suitable free page at source page order. It might be better
>>>> to skip that page and finish the rest of migration to achieve a better
>>>> compaction result.
>>>
>>> We can split the source folio if get_new_folio() returns NULL. So, do
>>> we really need this?
>>
>> It depends. The situation it can benefit is that when the system is going
>> to allocate a high order free page and trigger a compaction, it is possible to
>> get the high order free page by migrating a bunch of base pages instead of
>> splitting a existing high order folio.
>>
>>>
>>> In general, we may reconsider all further optimizations given splitting
>>> is available already.
>>
>> In my mind, split should be avoided as much as possible.
>
> If so, should we use "nosplit" logic in migrate_pages_batch() in some
> situation?

A possible future optimization.

>
>> But it really depends
>> on the actual situation, e.g., how much effort and cost the compaction wants
>> to pay to get memory defragmented. If the system really wants to get a high
>> order free page at any cost, split can be used without any issue. But applications
>> might lose performance because existing large folios are split just to a
>> new one.
>
> Is it possible that splitting is desirable in some situation? For
> example, allocate some large DMA buffers at the cost of large anonymous
> folios?

Sure. There are definitely cases split is better than non-split. But let's leave
it when large anonymous folio is deployed.

>
>> Like I said in the email, there are tons of optimizations and policies for us
>> to explore. We can start with the bare minimum support (if no performance
>> regression is observed, we can even start with split all high folios like you
>> suggested) and add optimizations one by one.
>
> Sound good to me! Thanks!
>
>>>
>>>> 3. Another possible optimization is to enable free page merge. It is
>>>> possible that a to-be-migrated page causes free page split then fails to
>>>> migrate eventually. We would lose a high order free page without free
>>>> page merge function. But a way of identifying free pages for memory
>>>> compaction is needed to reuse existing PFN-based buddy page merge.
>>>>
>>>> 4. The implemented >0 order folio compaction algorithm is quite naive
>>>> and does not consider all possible situations. A better algorithm can
>>>> improve compaction success rate.
>>>>
>>>>
>>>> Feel free to give comments and ask questions.
>>>>
>>>> Thanks.
>>>>
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/[email protected]/
>>>>
>>>> Zi Yan (4):
>>>> mm/compaction: add support for >0 order folio memory compaction.
>>>> mm/compaction: optimize >0 order folio compaction with free page
>>>> split.
>>>> mm/compaction: optimize >0 order folio compaction by sorting source
>>>> pages.
>>>> mm/compaction: enable compacting >0 order folios.
>>>>
>>>> mm/compaction.c | 205 +++++++++++++++++++++++++++++++++++++++---------
>>>> mm/internal.h | 7 +-
>>>> 2 files changed, 176 insertions(+), 36 deletions(-)
>
> --
> Best Regards,
> Huang, Ying

--
Best Regards,
Yan, Zi

Attachments:

signature.asc (871.00 B)
OpenPGP digital signature