2023-10-20 15:57:49

by Raghavendra K T

[permalink] [raw]
Subject: [PATCH V1 0/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

NUMA balancing code that updates PTEs by allowing unconditional scan
based on the value of processes' mm numa_scan_seq is not perfect.

More description is in patch1.

Have used the below patch to identify the corner case.

Detailed Result: (Only part of the result is updated
in patch1 to save space in commit log)

Detailed Result:

SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.

Base kernel: upstream 6.6-rc6 (dd72f9c7e512) with Mels patch-series
from tip/sched/core [1] applied.

Summary: Some benchmarks imrove. There is increase in system
time due to additional scanning. But elapsed time shows gain.

However there is also some overhead seen for benchmarks like NUMA01.

kernbench
========== base patched
Amean user-128 13799.58 ( 0.00%) 13789.86 * 0.07%*
Amean syst-128 3280.80 ( 0.00%) 3249.67 * 0.95%*
Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*

Duration User 41404.28 41375.08
Duration System 9862.22 9768.48
Duration Elapsed 519.87 518.72

Ops NUMA PTE updates 1041416.00 831536.00
Ops NUMA hint faults 263296.00 220966.00
Ops NUMA pages migrated 258021.00 212769.00
Ops AutoNUMA cost 1328.67 1114.69

autonumabench

NUMA01_THREADLOCAL
==================
Amean syst-NUMA01_THREADLOCAL 10.65 ( 0.00%) 26.47 *-148.59%*
Amean elsp-NUMA01_THREADLOCAL 81.79 ( 0.00%) 67.74 * 17.18%*

Duration User 54832.73 47379.67
Duration System 75.00 185.75
Duration Elapsed 576.72 476.09

Ops NUMA PTE updates 394429.00 11121044.00
Ops NUMA hint faults 1001.00 8906404.00
Ops NUMA pages migrated 288.00 2998694.00
Ops AutoNUMA cost 7.77 44666.84

NUMA01
=====
Amean syst-NUMA01 31.97 ( 0.00%) 52.95 * -65.62%*
Amean elsp-NUMA01 143.16 ( 0.00%) 150.81 * -5.34%*

Duration User 84839.49 91342.19
Duration System 224.26 371.12
Duration Elapsed 1005.64 1059.01

Ops NUMA PTE updates 33929508.00 50116313.00
Ops NUMA hint faults 34993820.00 52895783.00
Ops NUMA pages migrated 5456115.00 7441228.00
Ops AutoNUMA cost 175310.27 264971.11

NUMA02
=========
Amean syst-NUMA02 0.86 ( 0.00%) 0.86 * -0.50%*
Amean elsp-NUMA02 3.99 ( 0.00%) 3.82 * 4.40%*

Duration User 1186.06 1092.07
Duration System 6.44 6.47
Duration Elapsed 31.28 30.30

Ops NUMA PTE updates 776.00 731.00
Ops NUMA hint faults 527.00 490.00
Ops NUMA pages migrated 183.00 153.00
Ops AutoNUMA cost 2.64 2.46

Link: https://lore.kernel.org/linux-mm/[email protected]/T/

Raghavendra K T (1):
sched/numa: Fix mm numa_scan_seq based unconditional scan

include/linux/mm_types.h | 3 +++
kernel/sched/fair.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)

---8<---
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 010ba1b7cb0e..a4870b01c8a1 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -10,6 +10,30 @@
#include <linux/tracepoint.h>
#include <linux/binfmts.h>

+TRACE_EVENT(sched_vma_start_seq,
+
+ TP_PROTO(struct task_struct *t, struct vm_area_struct *vma, int start_seq),
+
+ TP_ARGS(t, vma, start_seq),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( void *, vma )
+ __field( int, start_seq )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
+ __entry->pid = t->pid;
+ __entry->vma = vma;
+ __entry->start_seq = start_seq;
+ ),
+
+ TP_printk("comm=%s pid=%d vma = %px start_seq=%d", __entry->comm, __entry->pid, __entry->vma,
+ __entry->start_seq)
+);
+
/*
* Tracepoint for calling kthread_stop, performed to end a kthread:
*/
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c8af3a7ccba7..e0c16ea8470b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3335,6 +3335,7 @@ static void task_numa_work(struct callback_head *work)
continue;

vma->numab_state->start_scan_seq = mm->numa_scan_seq;
+ trace_sched_vma_start_seq(p, vma, mm->numa_scan_seq);

vma->numab_state->next_scan = now +
msecs_to_jiffies(sysctl_numa_balancing_scan_delay);


--
2.34.1


2023-10-20 15:57:58

by Raghavendra K T

[permalink] [raw]
Subject: [PATCH V1 1/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic")

NUMA Balancing allows updating PTEs to trap NUMA hinting faults if the
task had previously accessed VMA. However unconditional scan of VMAs are
allowed during initial phase of VMA creation until process's
mm numa_scan_seq reaches 2 even though current task had not accessed VMA.

Rationale:
- Without initial scan subsequent PTE update may never happen.
- Give fair opportunity to all the VMAs to be scanned and subsequently
understand the access pattern of all the VMAs.

But it has a corner case where, if a VMA is created after some time,
process's mm numa_scan_seq could be already greater than 2.

For e.g., values of mm numa_scan_seq when VMAs are created by running
mmtest autonuma benchmark briefly looks like:
start_seq=0 : 459
start_seq=2 : 138
start_seq=3 : 144
start_seq=4 : 8
start_seq=8 : 1
start_seq=9 : 1
This results in no unconditional PTE updates for those VMAs created after
some time.

Fix:
- Note down the initial value of mm numa_scan_seq in per VMA start_seq.
- Allow unconditional scan till start_seq + 2.

Result:
SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
base kernel: upstream 6.6-rc6 with Mels patches [1] applied.

kernbench
========== base patched %gain
Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*

Duration User 41404.28 41375.08
Duration System 9862.22 9768.48
Duration Elapsed 519.87 518.72

Ops NUMA PTE updates 1041416.00 831536.00
Ops NUMA hint faults 263296.00 220966.00
Ops NUMA pages migrated 258021.00 212769.00
Ops AutoNUMA cost 1328.67 1114.69

autonumabench

NUMA01_THREADLOCAL
==================
Amean elsp-NUMA01_THREADLOCAL 81.79 (0.00%) 67.74 * 17.18%*

Duration User 54832.73 47379.67
Duration System 75.00 185.75
Duration Elapsed 576.72 476.09

Ops NUMA PTE updates 394429.00 11121044.00
Ops NUMA hint faults 1001.00 8906404.00
Ops NUMA pages migrated 288.00 2998694.00
Ops AutoNUMA cost 7.77 44666.84

Link: https://lore.kernel.org/linux-mm/[email protected]/T/

Signed-off-by: Raghavendra K T <[email protected]>
---
include/linux/mm_types.h | 3 +++
kernel/sched/fair.c | 4 +++-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 589f31ef2e84..679f076e3a91 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -576,6 +576,9 @@ struct vma_numab_state {
*/
unsigned long pids_active[2];

+ /* MM scan sequence ID when scan first started after VMA creation */
+ int start_scan_seq;
+
/*
* MM scan sequence ID when the VMA was last completely scanned.
* A VMA is not eligible for scanning if prev_scan_seq == numa_scan_seq
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5e03ced2b566..c8af3a7ccba7 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3191,7 +3191,7 @@ static bool vma_is_accessed(struct mm_struct *mm, struct vm_area_struct *vma)
* This is also done to avoid any side effect of task scanning
* amplifying the unfairness of disjoint set of VMAs' access.
*/
- if (READ_ONCE(current->mm->numa_scan_seq) < 2)
+ if ((READ_ONCE(current->mm->numa_scan_seq) - vma->numab_state->start_scan_seq) < 2)
return true;

pids = vma->numab_state->pids_active[0] | vma->numab_state->pids_active[1];
@@ -3334,6 +3334,8 @@ static void task_numa_work(struct callback_head *work)
if (!vma->numab_state)
continue;

+ vma->numab_state->start_scan_seq = mm->numa_scan_seq;
+
vma->numab_state->next_scan = now +
msecs_to_jiffies(sysctl_numa_balancing_scan_delay);

--
2.34.1

2023-10-23 05:26:43

by Raghavendra K T

[permalink] [raw]
Subject: Re: [PATCH V1 0/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

On 10/20/2023 9:27 PM, Raghavendra K T wrote:
> NUMA balancing code that updates PTEs by allowing unconditional scan
> based on the value of processes' mm numa_scan_seq is not perfect.
>
> More description is in patch1.
>
> Have used the below patch to identify the corner case.
>
> Detailed Result: (Only part of the result is updated
> in patch1 to save space in commit log)
>
> Detailed Result:
>
> SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
>
> Base kernel: upstream 6.6-rc6 (dd72f9c7e512) with Mels patch-series
> from tip/sched/core [1] applied.
>
> Summary: Some benchmarks imrove. There is increase in system
> time due to additional scanning. But elapsed time shows gain.
>
> However there is also some overhead seen for benchmarks like NUMA01.
>
> kernbench
> ========== base patched
> Amean user-128 13799.58 ( 0.00%) 13789.86 * 0.07%*
> Amean syst-128 3280.80 ( 0.00%) 3249.67 * 0.95%*
> Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*
>
> Duration User 41404.28 41375.08
> Duration System 9862.22 9768.48
> Duration Elapsed 519.87 518.72
>
> Ops NUMA PTE updates 1041416.00 831536.00
> Ops NUMA hint faults 263296.00 220966.00
> Ops NUMA pages migrated 258021.00 212769.00
> Ops AutoNUMA cost 1328.67 1114.69
>
> autonumabench
>
> NUMA01_THREADLOCAL
> ==================
> Amean syst-NUMA01_THREADLOCAL 10.65 ( 0.00%) 26.47 *-148.59%*
> Amean elsp-NUMA01_THREADLOCAL 81.79 ( 0.00%) 67.74 * 17.18%*
>
> Duration User 54832.73 47379.67
> Duration System 75.00 185.75
> Duration Elapsed 576.72 476.09
>
> Ops NUMA PTE updates 394429.00 11121044.00
> Ops NUMA hint faults 1001.00 8906404.00
> Ops NUMA pages migrated 288.00 2998694.00
> Ops AutoNUMA cost 7.77 44666.84
>
> NUMA01
> =====
> Amean syst-NUMA01 31.97 ( 0.00%) 52.95 * -65.62%*
> Amean elsp-NUMA01 143.16 ( 0.00%) 150.81 * -5.34%*
>
> Duration User 84839.49 91342.19
> Duration System 224.26 371.12
> Duration Elapsed 1005.64 1059.01
>
> Ops NUMA PTE updates 33929508.00 50116313.00
> Ops NUMA hint faults 34993820.00 52895783.00
> Ops NUMA pages migrated 5456115.00 7441228.00
> Ops AutoNUMA cost 175310.27 264971.11
>
> NUMA02
> =========
> Amean syst-NUMA02 0.86 ( 0.00%) 0.86 * -0.50%*
> Amean elsp-NUMA02 3.99 ( 0.00%) 3.82 * 4.40%*
>
> Duration User 1186.06 1092.07
> Duration System 6.44 6.47
> Duration Elapsed 31.28 30.30
>
> Ops NUMA PTE updates 776.00 731.00
> Ops NUMA hint faults 527.00 490.00
> Ops NUMA pages migrated 183.00 153.00
> Ops AutoNUMA cost 2.64 2.46
>
> Link: https://lore.kernel.org/linux-mm/[email protected]/T/
>

Forgot to add skip_vma_count trace results:

autonumabench: numa01_THREAD_LOCAL 3 iterations

base:
inaccessible:13133
pid_inactive:15807
scan_delay:471
seq_completed:50
shared_ro:6983
unsuitable:3917

patched:
inaccessible:4727
pid_inactive:5119
scan_delay:455
seq_completed:7
shared_ro:2551
unsuitable:5402



> Raghavendra K T (1):
> sched/numa: Fix mm numa_scan_seq based unconditional scan
>
> include/linux/mm_types.h | 3 +++
> kernel/sched/fair.c | 4 +++-
> 2 files changed, 6 insertions(+), 1 deletion(-)
>
> ---8<---
> diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
> index 010ba1b7cb0e..a4870b01c8a1 100644
> --- a/include/trace/events/sched.h
> +++ b/include/trace/events/sched.h
> @@ -10,6 +10,30 @@
> #include <linux/tracepoint.h>
> #include <linux/binfmts.h>
>
> +TRACE_EVENT(sched_vma_start_seq,
> +
> + TP_PROTO(struct task_struct *t, struct vm_area_struct *vma, int start_seq),
> +
> + TP_ARGS(t, vma, start_seq),
> +
> + TP_STRUCT__entry(
> + __array( char, comm, TASK_COMM_LEN )
> + __field( pid_t, pid )
> + __field( void *, vma )
> + __field( int, start_seq )
> + ),
> +
> + TP_fast_assign(
> + memcpy(__entry->comm, t->comm, TASK_COMM_LEN);
> + __entry->pid = t->pid;
> + __entry->vma = vma;
> + __entry->start_seq = start_seq;
> + ),
> +
> + TP_printk("comm=%s pid=%d vma = %px start_seq=%d", __entry->comm, __entry->pid, __entry->vma,
> + __entry->start_seq)
> +);
> +
> /*
> * Tracepoint for calling kthread_stop, performed to end a kthread:
> */
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index c8af3a7ccba7..e0c16ea8470b 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3335,6 +3335,7 @@ static void task_numa_work(struct callback_head *work)
> continue;
>
> vma->numab_state->start_scan_seq = mm->numa_scan_seq;
> + trace_sched_vma_start_seq(p, vma, mm->numa_scan_seq);
>
> vma->numab_state->next_scan = now +
> msecs_to_jiffies(sysctl_numa_balancing_scan_delay);
>
>

2023-10-27 05:25:25

by Raghavendra K T

[permalink] [raw]
Subject: Re: [PATCH V1 0/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

On 10/20/2023 9:27 PM, Raghavendra K T wrote:
> NUMA balancing code that updates PTEs by allowing unconditional scan
> based on the value of processes' mm numa_scan_seq is not perfect.
>
> More description is in patch1.
>
> Have used the below patch to identify the corner case.
>
> Detailed Result: (Only part of the result is updated
> in patch1 to save space in commit log)
>

Gentle ping to check if there are any concerns / comments
on the patch :)

Thanks and Regards
- Raghu

2023-11-01 09:22:31

by Mel Gorman

[permalink] [raw]
Subject: Re: [PATCH V1 1/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

On Fri, Oct 20, 2023 at 09:27:46PM +0530, Raghavendra K T wrote:
> Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
>
> NUMA Balancing allows updating PTEs to trap NUMA hinting faults if the
> task had previously accessed VMA. However unconditional scan of VMAs are
> allowed during initial phase of VMA creation until process's
> mm numa_scan_seq reaches 2 even though current task had not accessed VMA.
>
> Rationale:
> - Without initial scan subsequent PTE update may never happen.
> - Give fair opportunity to all the VMAs to be scanned and subsequently
> understand the access pattern of all the VMAs.
>
> But it has a corner case where, if a VMA is created after some time,
> process's mm numa_scan_seq could be already greater than 2.
>
> For e.g., values of mm numa_scan_seq when VMAs are created by running
> mmtest autonuma benchmark briefly looks like:
> start_seq=0 : 459
> start_seq=2 : 138
> start_seq=3 : 144
> start_seq=4 : 8
> start_seq=8 : 1
> start_seq=9 : 1
> This results in no unconditional PTE updates for those VMAs created after
> some time.
>
> Fix:
> - Note down the initial value of mm numa_scan_seq in per VMA start_seq.
> - Allow unconditional scan till start_seq + 2.
>
> Result:
> SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
> base kernel: upstream 6.6-rc6 with Mels patches [1] applied.
>
> kernbench
> ========== base patched %gain
> Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*
>
> Duration User 41404.28 41375.08
> Duration System 9862.22 9768.48
> Duration Elapsed 519.87 518.72
>
> Ops NUMA PTE updates 1041416.00 831536.00
> Ops NUMA hint faults 263296.00 220966.00
> Ops NUMA pages migrated 258021.00 212769.00
> Ops AutoNUMA cost 1328.67 1114.69
>
> autonumabench
>
> NUMA01_THREADLOCAL
> ==================
> Amean elsp-NUMA01_THREADLOCAL 81.79 (0.00%) 67.74 * 17.18%*
>
> Duration User 54832.73 47379.67
> Duration System 75.00 185.75
> Duration Elapsed 576.72 476.09
>
> Ops NUMA PTE updates 394429.00 11121044.00
> Ops NUMA hint faults 1001.00 8906404.00
> Ops NUMA pages migrated 288.00 2998694.00
> Ops AutoNUMA cost 7.77 44666.84
>
> Link: https://lore.kernel.org/linux-mm/[email protected]/T/
>
> Signed-off-by: Raghavendra K T <[email protected]>

Acked-by: Mel Gorman <[email protected]>

--
Mel Gorman
SUSE Labs

2023-11-01 10:31:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH V1 1/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

On Wed, Nov 01, 2023 at 09:21:01AM +0000, Mel Gorman wrote:
> On Fri, Oct 20, 2023 at 09:27:46PM +0530, Raghavendra K T wrote:
> > Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
> >
> > NUMA Balancing allows updating PTEs to trap NUMA hinting faults if the
> > task had previously accessed VMA. However unconditional scan of VMAs are
> > allowed during initial phase of VMA creation until process's
> > mm numa_scan_seq reaches 2 even though current task had not accessed VMA.
> >
> > Rationale:
> > - Without initial scan subsequent PTE update may never happen.
> > - Give fair opportunity to all the VMAs to be scanned and subsequently
> > understand the access pattern of all the VMAs.
> >
> > But it has a corner case where, if a VMA is created after some time,
> > process's mm numa_scan_seq could be already greater than 2.
> >
> > For e.g., values of mm numa_scan_seq when VMAs are created by running
> > mmtest autonuma benchmark briefly looks like:
> > start_seq=0 : 459
> > start_seq=2 : 138
> > start_seq=3 : 144
> > start_seq=4 : 8
> > start_seq=8 : 1
> > start_seq=9 : 1
> > This results in no unconditional PTE updates for those VMAs created after
> > some time.
> >
> > Fix:
> > - Note down the initial value of mm numa_scan_seq in per VMA start_seq.
> > - Allow unconditional scan till start_seq + 2.
> >
> > Result:
> > SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
> > base kernel: upstream 6.6-rc6 with Mels patches [1] applied.
> >
> > kernbench
> > ========== base patched %gain
> > Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*
> >
> > Duration User 41404.28 41375.08
> > Duration System 9862.22 9768.48
> > Duration Elapsed 519.87 518.72
> >
> > Ops NUMA PTE updates 1041416.00 831536.00
> > Ops NUMA hint faults 263296.00 220966.00
> > Ops NUMA pages migrated 258021.00 212769.00
> > Ops AutoNUMA cost 1328.67 1114.69
> >
> > autonumabench
> >
> > NUMA01_THREADLOCAL
> > ==================
> > Amean elsp-NUMA01_THREADLOCAL 81.79 (0.00%) 67.74 * 17.18%*
> >
> > Duration User 54832.73 47379.67
> > Duration System 75.00 185.75
> > Duration Elapsed 576.72 476.09
> >
> > Ops NUMA PTE updates 394429.00 11121044.00
> > Ops NUMA hint faults 1001.00 8906404.00
> > Ops NUMA pages migrated 288.00 2998694.00
> > Ops AutoNUMA cost 7.77 44666.84
> >
> > Link: https://lore.kernel.org/linux-mm/[email protected]/T/
> >
> > Signed-off-by: Raghavendra K T <[email protected]>
>
> Acked-by: Mel Gorman <[email protected]>

Thanks, will queue for the next merge window (6.8 I think that is) once
6.7-rc1 comes around.

2023-11-02 05:17:54

by Raghavendra K T

[permalink] [raw]
Subject: Re: [PATCH V1 1/1] sched/numa: Fix mm numa_scan_seq based unconditional scan

On 11/1/2023 4:01 PM, Peter Zijlstra wrote:
> On Wed, Nov 01, 2023 at 09:21:01AM +0000, Mel Gorman wrote:
>> On Fri, Oct 20, 2023 at 09:27:46PM +0530, Raghavendra K T wrote:
>>> Since commit fc137c0ddab2 ("sched/numa: enhance vma scanning logic")
>>>
>>> NUMA Balancing allows updating PTEs to trap NUMA hinting faults if the
>>> task had previously accessed VMA. However unconditional scan of VMAs are
>>> allowed during initial phase of VMA creation until process's
>>> mm numa_scan_seq reaches 2 even though current task had not accessed VMA.
>>>
>>> Rationale:
>>> - Without initial scan subsequent PTE update may never happen.
>>> - Give fair opportunity to all the VMAs to be scanned and subsequently
>>> understand the access pattern of all the VMAs.
>>>
>>> But it has a corner case where, if a VMA is created after some time,
>>> process's mm numa_scan_seq could be already greater than 2.
>>>
>>> For e.g., values of mm numa_scan_seq when VMAs are created by running
>>> mmtest autonuma benchmark briefly looks like:
>>> start_seq=0 : 459
>>> start_seq=2 : 138
>>> start_seq=3 : 144
>>> start_seq=4 : 8
>>> start_seq=8 : 1
>>> start_seq=9 : 1
>>> This results in no unconditional PTE updates for those VMAs created after
>>> some time.
>>>
>>> Fix:
>>> - Note down the initial value of mm numa_scan_seq in per VMA start_seq.
>>> - Allow unconditional scan till start_seq + 2.
>>>
>>> Result:
>>> SUT: AMD EPYC Milan with 2 NUMA nodes 256 cpus.
>>> base kernel: upstream 6.6-rc6 with Mels patches [1] applied.
>>>
>>> kernbench
>>> ========== base patched %gain
>>> Amean elsp-128 165.09 ( 0.00%) 164.78 * 0.19%*
>>>
>>> Duration User 41404.28 41375.08
>>> Duration System 9862.22 9768.48
>>> Duration Elapsed 519.87 518.72
>>>
>>> Ops NUMA PTE updates 1041416.00 831536.00
>>> Ops NUMA hint faults 263296.00 220966.00
>>> Ops NUMA pages migrated 258021.00 212769.00
>>> Ops AutoNUMA cost 1328.67 1114.69
>>>
>>> autonumabench
>>>
>>> NUMA01_THREADLOCAL
>>> ==================
>>> Amean elsp-NUMA01_THREADLOCAL 81.79 (0.00%) 67.74 * 17.18%*
>>>
>>> Duration User 54832.73 47379.67
>>> Duration System 75.00 185.75
>>> Duration Elapsed 576.72 476.09
>>>
>>> Ops NUMA PTE updates 394429.00 11121044.00
>>> Ops NUMA hint faults 1001.00 8906404.00
>>> Ops NUMA pages migrated 288.00 2998694.00
>>> Ops AutoNUMA cost 7.77 44666.84
>>>
>>> Link: https://lore.kernel.org/linux-mm/[email protected]/T/
>>>
>>> Signed-off-by: Raghavendra K T <[email protected]>
>>
>> Acked-by: Mel Gorman <[email protected]>
>
> Thanks, will queue for the next merge window (6.8 I think that is) once
> 6.7-rc1 comes around.

Thank you Mel, PeterZ.

Meanwhile, I will check if extending #history (PeterZ) on this changed
baseline, as well as implications of extending #bits for PIDS (Ingo)
suggested (especially larger machine) helps and come back if I find
anything interesting.

Thanks and Regards
- Raghu