2023-04-25 10:08:16

by Zhao Mengmeng

[permalink] [raw]
Subject: [PATCH v1] sched/fair: fix inconsistency in update_task_scan_period

From: Zhao Mengmeng <[email protected]>

During calculate numa_scan_period diff, the actual code
and the comment are inconsistent. The comment says it is
using shared faults ratio, but code uses private faults
ratio. This patch fixes it.

Signed-off-by: Zhao Mengmeng <[email protected]>
---
kernel/sched/fair.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..73cc83128072 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2362,7 +2362,7 @@ static void update_task_scan_period(struct task_struct *p,
unsigned long shared, unsigned long private)
{
unsigned int period_slot;
- int lr_ratio, ps_ratio;
+ int lr_ratio, sp_ratio;
int diff;

unsigned long remote = p->numa_faults_locality[0];
@@ -2393,24 +2393,24 @@ static void update_task_scan_period(struct task_struct *p,
*/
period_slot = DIV_ROUND_UP(p->numa_scan_period, NUMA_PERIOD_SLOTS);
lr_ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote);
- ps_ratio = (private * NUMA_PERIOD_SLOTS) / (private + shared);
+ sp_ratio = (shared * NUMA_PERIOD_SLOTS) / (private + shared);

- if (ps_ratio >= NUMA_PERIOD_THRESHOLD) {
+ if (lr_ratio >= NUMA_PERIOD_THRESHOLD) {
/*
* Most memory accesses are local. There is no need to
* do fast NUMA scanning, since memory is already local.
*/
- int slot = ps_ratio - NUMA_PERIOD_THRESHOLD;
+ int slot = lr_ratio - NUMA_PERIOD_THRESHOLD;
if (!slot)
slot = 1;
diff = slot * period_slot;
- } else if (lr_ratio >= NUMA_PERIOD_THRESHOLD) {
+ } else if (sp_ratio >= NUMA_PERIOD_THRESHOLD) {
/*
* Most memory accesses are shared with other tasks.
* There is no point in continuing fast NUMA scanning,
* since other tasks may just move the memory elsewhere.
*/
- int slot = lr_ratio - NUMA_PERIOD_THRESHOLD;
+ int slot = sp_ratio - NUMA_PERIOD_THRESHOLD;
if (!slot)
slot = 1;
diff = slot * period_slot;
@@ -2420,7 +2420,7 @@ static void update_task_scan_period(struct task_struct *p,
* yet they are not on the local NUMA node. Speed up
* NUMA scanning to get the memory moved over.
*/
- int ratio = max(lr_ratio, ps_ratio);
+ int ratio = max(lr_ratio, sp_ratio);
diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot;
}

--
2.38.1


2023-04-25 10:26:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v1] sched/fair: fix inconsistency in update_task_scan_period

On Tue, Apr 25, 2023 at 06:02:04AM -0400, [email protected] wrote:
> From: Zhao Mengmeng <[email protected]>
>
> During calculate numa_scan_period diff, the actual code
> and the comment are inconsistent. The comment says it is
> using shared faults ratio, but code uses private faults
> ratio. This patch fixes it.

So for some reason you think the comment is correct ? You also don't
discuss the performance changes caused by changing the code.

2023-04-26 02:15:17

by Zhao Mengmeng

[permalink] [raw]
Subject: Re: [PATCH v1] sched/fair: fix inconsistency in update_task_scan_period

On 2023/4/25 18:24, Peter Zijlstra wrote:
> On Tue, Apr 25, 2023 at 06:02:04AM -0400, [email protected] wrote:
>> From: Zhao Mengmeng <[email protected]>
>>
>> During calculate numa_scan_period diff, the actual code
>> and the comment are inconsistent. The comment says it is
>> using shared faults ratio, but code uses private faults
>> ratio. This patch fixes it.
>
> So for some reason you think the comment is correct ? You also don't
> discuss the performance changes caused by changing the code.

Sorry for the repeated mail and thanks for your comments.

First, I'd like to make some additions. When I was studying the
numa-balancing code and found this inconsistency, I feel like the same
as you: the comment is out-dated and not matching the code , so I was
intended to change the comment. But after some git digging work, I
change my mind.

Function update_task_scan_period was added in "04bb2f9475054:
sched/numa: Adjust scan rate in task_numa_placement", which set up the
basis. Though the details logic inside the function may change a lot,
this function's outer-side effect keep the same.
"""
Increase the scan period (slow down scanning) if the majority of our
memory is
already on our local node, or if the majority of the page accesses are
shared
with other processes
"""
Later, in commit "37ec97deb3a8c: sched/numa: Slow down scan rate if
shared faults dominate", the author claims that
"""
However, with the current code, all a high ratio of shared accesses
does is slow down the rate at which scanning is made faster.

This patch changes things so either lots of shared accesses or
lots of local accesses will slow down scanning, and numa scanning
is sped up only when there are lots of private faults on remote
memory pages.
"""
But the actual code logic doesn't reflect his intention in commit log,
which resulting the latest code used by kernel. Based on the this, I
change the code and make this patch. That is *the reason I think the
comment is correct*.

Second, the performance. I'm sorry that it is the shortage of my work.
Which benchmark shall I use, like autonuma-benchmark? I'm lack of
physical server with multi numa node for testing, is it enough to offer
a virtual machine testing result? Besides, this patch just meant to
achieve the initial design, not for optimization, is a performance
result necessary?

Best regards.