Hi,
Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
is a little larger that will significantly impact the workload's performance.
So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
accessing. And I posted a patchset[1] to support speculative numa fault to
improve the NUMA balancing's performance according to the principle of data
locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
migration as a way to reduce the cost of TLB flush, and it will also benefit
the migration of multiple pages all at once during NUMA balancing.
So we plan to continue to support batch migration in do_numa_page() to improve
the NUMA balancing's performance, but before adding complicated batch migration
algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
which are done in this patch set. In short, this patchset extends the
migrate_misplaced_page() interface to support batch migration, and no functional
changes intended.
In addition, these cleanup can also benefit the compound page's NUMA balancing,
which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
balancing, it is possible that partial pages were successfully migrated, so it is
necessary to return the number of pages that were successfully migrated from
migrate_misplaced_page().
This series is based on the latest mm-unstable(d226b59b30cc).
[1] https://lore.kernel.org/lkml/[email protected]/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
[2] https://lore.kernel.org/all/[email protected]/T/#u
[3] https://lore.kernel.org/all/[email protected]/
Changes from v1:
- Move page validation into a new function suggested by Huang Ying.
- Change numamigrate_isolate_page() to boolean type.
- Update some commit message.
Baolin Wang (4):
mm: migrate: factor out migration validation into
numa_page_can_migrate()
mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
mm: migrate: change migrate_misplaced_page() to support multiple pages
migration
mm: migrate: change to return the number of pages migrated
successfully
include/linux/migrate.h | 15 +++++++---
mm/huge_memory.c | 23 +++++++++++++--
mm/internal.h | 1 +
mm/memory.c | 43 ++++++++++++++++++++++++++-
mm/migrate.c | 64 +++++++++--------------------------------
5 files changed, 88 insertions(+), 58 deletions(-)
--
2.39.3
Change the migrate_misplaced_page() to return the number of pages migrated
successfully, which is used to calculate how many pages are failed to
migrate for batch migration. For the compound page's NUMA balancing support,
it is possible that partial pages were successfully migrated, so it is
necessary to return the number of pages that were successfully migrated from
migrate_misplaced_page().
Signed-off-by: Baolin Wang <[email protected]>
---
mm/huge_memory.c | 9 +++++----
mm/memory.c | 4 +++-
mm/migrate.c | 5 +----
3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4401a3493544..951f73d6b5bf 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1494,10 +1494,11 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
int page_nid = NUMA_NO_NODE;
int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
- bool migrated = false, writable = false;
+ bool writable = false;
int flags = 0;
pg_data_t *pgdat;
LIST_HEAD(migratepages);
+ int nr_successed;
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) {
@@ -1554,9 +1555,9 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
}
list_add(&page->lru, &migratepages);
- migrated = migrate_misplaced_page(&migratepages, vma,
- page_nid, target_nid);
- if (migrated) {
+ nr_successed = migrate_misplaced_page(&migratepages, vma,
+ page_nid, target_nid);
+ if (nr_successed) {
flags |= TNF_MIGRATED;
page_nid = target_nid;
} else {
diff --git a/mm/memory.c b/mm/memory.c
index 9e417e8dd5d5..2773cd804ee9 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4771,6 +4771,7 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
int flags = 0;
pg_data_t *pgdat;
LIST_HEAD(migratepages);
+ int nr_succeeded;
/*
* The "pte" at this point cannot be used safely without
@@ -4854,7 +4855,8 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
list_add(&page->lru, &migratepages);
/* Migrate to the requested node */
- if (migrate_misplaced_page(&migratepages, vma, page_nid, target_nid)) {
+ nr_succeeded = migrate_misplaced_page(&migratepages, vma, page_nid, target_nid);
+ if (nr_succeeded) {
page_nid = target_nid;
flags |= TNF_MIGRATED;
} else {
diff --git a/mm/migrate.c b/mm/migrate.c
index fae7224b8e64..5435cfb225ab 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -2523,7 +2523,6 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
int source_nid, int target_nid)
{
pg_data_t *pgdat = NODE_DATA(target_nid);
- int migrated = 1;
int nr_remaining;
unsigned int nr_succeeded;
@@ -2533,8 +2532,6 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
if (nr_remaining) {
if (!list_empty(migratepages))
putback_movable_pages(migratepages);
-
- migrated = 0;
}
if (nr_succeeded) {
count_vm_numa_events(NUMA_PAGE_MIGRATE, nr_succeeded);
@@ -2543,7 +2540,7 @@ int migrate_misplaced_page(struct list_head *migratepages, struct vm_area_struct
nr_succeeded);
}
BUG_ON(!list_empty(migratepages));
- return migrated;
+ return nr_succeeded;
}
#endif /* CONFIG_NUMA_BALANCING */
#endif /* CONFIG_NUMA */
--
2.39.3
On 8/24/2023 12:51 PM, Huang, Ying wrote:
> Baolin Wang <[email protected]> writes:
>
>> On 8/22/2023 10:47 AM, Huang, Ying wrote:
>>> Baolin Wang <[email protected]> writes:
>>>
>>>> Hi,
>>>>
>>>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>>>> is a little larger that will significantly impact the workload's performance.
>>>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>>>> accessing. And I posted a patchset[1] to support speculative numa fault to
>>>> improve the NUMA balancing's performance according to the principle of data
>>>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>>>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>>>> the migration of multiple pages all at once during NUMA balancing.
>>>>
>>>> So we plan to continue to support batch migration in do_numa_page() to improve
>>>> the NUMA balancing's performance, but before adding complicated batch migration
>>>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>>>> which are done in this patch set. In short, this patchset extends the
>>>> migrate_misplaced_page() interface to support batch migration, and no functional
>>>> changes intended.
>>>>
>>>> In addition, these cleanup can also benefit the compound page's NUMA balancing,
>>>> which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
>>>> balancing, it is possible that partial pages were successfully migrated, so it is
>>>> necessary to return the number of pages that were successfully migrated from
>>>> migrate_misplaced_page().
>>> But I don't find the return number is used except as bool now.
>>
>> As I said above, this is a preparation for batch migration and
>> compound page NUMA balancing in future.
>>
>> In addition, after looking into the THP' NUMA migration, I found this
>> change is necessary for THP migration. Since it is possible that
>> partial subpages were successfully migrated if the THP is split, so
>> below THP numa fault statistics is not always correct:
>>
>> if (page_nid != NUMA_NO_NODE)
>> task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR,
>> flags);
>>
>> I will try to fix this in next version.
>
> IIUC, THP will not be split for NUMA balancing. Please check the
> nosplit logic in migrate_pages_batch().
>
> bool nosplit = (reason == MR_NUMA_MISPLACED);
Yes, I overlooked this. Thanks for reminding.
>
>>> Per my understanding, I still don't find much value of the changes
>>> except as preparation for batch migration in NUMA balancing. So I still
>>
>> IMO, only patch 3 is just a preparation for batch migration, but other
>> patches are some cleanups for migrate_misplaced_page(). I can drop the
>> preparation patches in this series and revise the commit message.
>>
>>> think it's better to wait for the whole series. Where we can check why
>>> these changes are necessary for batch migration. And I think that you
>>> will provide some number to justify the batch migration, including pros
>>> and cons.
>>> --
>>> Best Regards,
>>> Huang, Ying
>>>
>>>> This series is based on the latest mm-unstable(d226b59b30cc).
>>>>
>>>> [1] https://lore.kernel.org/lkml/[email protected]/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>>>> [2] https://lore.kernel.org/all/[email protected]/T/#u
>>>> [3] https://lore.kernel.org/all/[email protected]/
>>>>
>>>> Changes from v1:
>>>> - Move page validation into a new function suggested by Huang Ying.
>>>> - Change numamigrate_isolate_page() to boolean type.
>>>> - Update some commit message.
>>>>
>>>> Baolin Wang (4):
>>>> mm: migrate: factor out migration validation into
>>>> numa_page_can_migrate()
>>>> mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>>>> mm: migrate: change migrate_misplaced_page() to support multiple pages
>>>> migration
>>>> mm: migrate: change to return the number of pages migrated
>>>> successfully
>>>>
>>>> include/linux/migrate.h | 15 +++++++---
>>>> mm/huge_memory.c | 23 +++++++++++++--
>>>> mm/internal.h | 1 +
>>>> mm/memory.c | 43 ++++++++++++++++++++++++++-
>>>> mm/migrate.c | 64 +++++++++--------------------------------
>>>> 5 files changed, 88 insertions(+), 58 deletions(-)
On 8/22/2023 10:47 AM, Huang, Ying wrote:
> Baolin Wang <[email protected]> writes:
>
>> Hi,
>>
>> Currently, on our ARM servers with NUMA enabled, we found the cross-die latency
>> is a little larger that will significantly impact the workload's performance.
>> So on ARM servers we will rely on the NUMA balancing to avoid the cross-die
>> accessing. And I posted a patchset[1] to support speculative numa fault to
>> improve the NUMA balancing's performance according to the principle of data
>> locality. Moreover, thanks to Huang Ying's patchset[2], which introduced batch
>> migration as a way to reduce the cost of TLB flush, and it will also benefit
>> the migration of multiple pages all at once during NUMA balancing.
>>
>> So we plan to continue to support batch migration in do_numa_page() to improve
>> the NUMA balancing's performance, but before adding complicated batch migration
>> algorithm for NUMA balancing, some cleanup and preparation work need to do firstly,
>> which are done in this patch set. In short, this patchset extends the
>> migrate_misplaced_page() interface to support batch migration, and no functional
>> changes intended.
>>
>> In addition, these cleanup can also benefit the compound page's NUMA balancing,
>> which was discussed in previous thread[3]. IIUC, for the compound page's NUMA
>> balancing, it is possible that partial pages were successfully migrated, so it is
>> necessary to return the number of pages that were successfully migrated from
>> migrate_misplaced_page().
>
> But I don't find the return number is used except as bool now.
As I said above, this is a preparation for batch migration and compound
page NUMA balancing in future.
In addition, after looking into the THP' NUMA migration, I found this
change is necessary for THP migration. Since it is possible that partial
subpages were successfully migrated if the THP is split, so below THP
numa fault statistics is not always correct:
if (page_nid != NUMA_NO_NODE)
task_numa_fault(last_cpupid, page_nid, HPAGE_PMD_NR,
flags);
I will try to fix this in next version.
> Per my understanding, I still don't find much value of the changes
> except as preparation for batch migration in NUMA balancing. So I still
IMO, only patch 3 is just a preparation for batch migration, but other
patches are some cleanups for migrate_misplaced_page(). I can drop the
preparation patches in this series and revise the commit message.
> think it's better to wait for the whole series. Where we can check why
> these changes are necessary for batch migration. And I think that you
> will provide some number to justify the batch migration, including pros
> and cons.
>
> --
> Best Regards,
> Huang, Ying
>
>> This series is based on the latest mm-unstable(d226b59b30cc).
>>
>> [1] https://lore.kernel.org/lkml/[email protected]/t/#mc45929849b5d0e29b5fdd9d50425f8e95b8f2563
>> [2] https://lore.kernel.org/all/[email protected]/T/#u
>> [3] https://lore.kernel.org/all/[email protected]/
>>
>> Changes from v1:
>> - Move page validation into a new function suggested by Huang Ying.
>> - Change numamigrate_isolate_page() to boolean type.
>> - Update some commit message.
>>
>> Baolin Wang (4):
>> mm: migrate: factor out migration validation into
>> numa_page_can_migrate()
>> mm: migrate: move the numamigrate_isolate_page() into do_numa_page()
>> mm: migrate: change migrate_misplaced_page() to support multiple pages
>> migration
>> mm: migrate: change to return the number of pages migrated
>> successfully
>>
>> include/linux/migrate.h | 15 +++++++---
>> mm/huge_memory.c | 23 +++++++++++++--
>> mm/internal.h | 1 +
>> mm/memory.c | 43 ++++++++++++++++++++++++++-
>> mm/migrate.c | 64 +++++++++--------------------------------
>> 5 files changed, 88 insertions(+), 58 deletions(-)