2024-05-24 13:41:35

by Chunxin Zang

[permalink] [raw]
Subject: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

I found that some tasks have been running for a long enough time and
have become illegal, but they are still not releasing the CPU. This
will increase the scheduling delay of other processes. Therefore, I
tried checking the current process in wakeup_preempt and entity_tick,
and if it is illegal, reschedule that cfs queue.

The modification can reduce the scheduling delay by about 30% when
RUN_TO_PARITY is enabled.
So far, it has been running well in my test environment, and I have
pasted some test results below.

I isolated four cores for testing. I ran Hackbench in the background
and observed the test results of cyclictest.

hackbench -g 4 -l 100000000 &
cyclictest --mlockall -D 5m -q

EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY

# Min Latencies: 00006 00006 00006 00006
LNICE(-19) # Avg Latencies: 00191 00122 00089 00066
# Max Latencies: 15442 07648 14133 07713

# Min Latencies: 00006 00010 00006 00006
LNICE(0) # Avg Latencies: 00466 00277 00289 00257
# Max Latencies: 38917 32391 32665 17710

# Min Latencies: 00019 00053 00010 00013
LNICE(19) # Avg Latencies: 37151 31045 18293 23035
# Max Latencies: 2688299 7031295 426196 425708

I'm actually a bit hesitant about placing this modification under the
NO_PARITY feature. This is because the modification conflicts with the
semantics of RUN_TO_PARITY. So, I captured and compared the number of
resched occurrences in wakeup_preempt to see if it introduced any
additional overhead.

Similarly, hackbench is used to stress the utilization of four cores to
100%, and the method for capturing the number of PREEMPT occurrences is
referenced from [1].

schedstats EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY CFS(6.5)
stats.check_preempt_count 5053054 5057286 5003806 5018589 5031908
stats.patch_cause_preempt_count ------- 858044 ------- 765726 -------
stats.need_preempt_count 570520 858684 3380513 3426977 1140821

From the above test results, there is a slight increase in the number of
resched occurrences in wakeup_preempt. However, the results vary with each
test, and sometimes the difference is not that significant. But overall,
the count of reschedules remains lower than that of CFS and is much less
than that of NO_PARITY.

[1]: https://lore.kernel.org/all/[email protected]/T/#m52057282ceb6203318be1ce9f835363de3bef5cb

Signed-off-by: Chunxin Zang <[email protected]>
Reviewed-by: Chen Yang <[email protected]>
---
kernel/sched/fair.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..a0005d240db5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
return;
#endif
+
+ if (!entity_eligible(cfs_rq, curr))
+ resched_curr(rq_of(cfs_rq));
}


@@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
return;

+ if (!entity_eligible(cfs_rq, se))
+ goto preempt;
+
find_matching_se(&se, &pse);
WARN_ON_ONCE(!pse);

--
2.34.1



2024-05-24 15:35:15

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
> I found that some tasks have been running for a long enough time and
> have become illegal, but they are still not releasing the CPU. This
> will increase the scheduling delay of other processes. Therefore, I
> tried checking the current process in wakeup_preempt and entity_tick,
> and if it is illegal, reschedule that cfs queue.
>
> The modification can reduce the scheduling delay by about 30% when
> RUN_TO_PARITY is enabled.
> So far, it has been running well in my test environment, and I have
> pasted some test results below.
>

Interesting, besides hackbench, I assume that you have workload in
real production environment that is sensitive to wakeup latency?

>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..a0005d240db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
> return;
> #endif
> +
> + if (!entity_eligible(cfs_rq, curr))
> + resched_curr(rq_of(cfs_rq));
> }
>

entity_tick() -> update_curr() -> update_deadline():
se->vruntime >= se->deadline ? resched_curr()
only current has expired its slice will it be scheduled out.

So here you want to schedule current out if its lag becomes 0.

In lastest sched/eevdf branch, it is controlled by two sched features:
RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c

Maybe something like this can achieve your goal
if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
resched_curr

>
> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> return;
>
> + if (!entity_eligible(cfs_rq, se))
> + goto preempt;
> +

Not sure if this is applicable, later in this function, pick_eevdf() checks
if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
be evicted. And this change does not consider the cgroup hierarchy.

Besides, the check of current eligiblity can get false negative result,
if the enqueued entity has a positive lag. Prateek proposed to
remove the check of current's eligibility in pick_eevdf():
https://lore.kernel.org/lkml/[email protected]/

If I understand your requirement correctly, you want to reduce the wakeup
latency. There are some codes under developed by Peter, which could
customized task's wakeup latency via setting its slice:
https://lore.kernel.org/lkml/[email protected]/

thanks,
Chenyu

2024-05-25 06:42:13

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On Fri, 2024-05-24 at 21:40 +0800, Chunxin Zang wrote:
> I found that some tasks have been running for a long enough time and
> have become illegal, but they are still not releasing the CPU. This
> will increase the scheduling delay of other processes. Therefore, I
> tried checking the current process in wakeup_preempt and entity_tick,
> and if it is illegal, reschedule that cfs queue.

My box gave making the XXX below reality a two thumbs up when fiddling
with the original unfettered and a bit harsh RUN_TO_PARITY.

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..922834f172b0 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8413,12 +8413,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
update_curr(cfs_rq);

/*
- * XXX pick_eevdf(cfs_rq) != se ?
+ * Run @curr until it is no longer our best option. Basing the preempt
+ * decision on @curr reselection puts any previous decisions back on the
+ * table in context "now", including granularity preservation decisions
+ * by RUN_TO_PARITY.
*/
- if (pick_eevdf(cfs_rq) == pse)
- goto preempt;
-
- return;
+ if (pick_eevdf(cfs_rq) == se)
+ return;

preempt:
resched_curr(rq);


2024-05-25 11:58:11

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On 2024-05-25 at 08:41:28 +0200, Mike Galbraith wrote:
> On Fri, 2024-05-24 at 21:40 +0800, Chunxin Zang wrote:
> > I found that some tasks have been running for a long enough time and
> > have become illegal, but they are still not releasing the CPU. This
> > will increase the scheduling delay of other processes. Therefore, I
> > tried checking the current process in wakeup_preempt and entity_tick,
> > and if it is illegal, reschedule that cfs queue.
>
> My box gave making the XXX below reality a two thumbs up when fiddling
> with the original unfettered and a bit harsh RUN_TO_PARITY.
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..922834f172b0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8413,12 +8413,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> update_curr(cfs_rq);
>
> /*
> - * XXX pick_eevdf(cfs_rq) != se ?
> + * Run @curr until it is no longer our best option. Basing the preempt
> + * decision on @curr reselection puts any previous decisions back on the
> + * table in context "now", including granularity preservation decisions
> + * by RUN_TO_PARITY.
> */
> - if (pick_eevdf(cfs_rq) == pse)
> - goto preempt;
> -
> - return;
> + if (pick_eevdf(cfs_rq) == se)
> + return;
>

I suppose this change benefits the overloaded scenario:
neither current nor the wakee is the best one.

before: current continues to run.
after: best se in the tree preempts current.

hackbench -g 12 -l 1000000000 & (480 tasks, 2x of the CPUs)

cyclictest --mlockall -D 1m -q
before:
T: 0 (15983) P: 0 I:1000 C: 43054 Min: 11 Act: 144 Avg: 627 Max: 11446

after:
T: 0 (16473) P: 0 I:1000 C: 49822 Min: 7 Act: 160 Avg: 388 Max: 10190

Min, Avg, Max latency all decreased.

thanks,
Chenyu

2024-05-25 12:40:37

by Honglei Wang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



On 2024/5/24 21:40, Chunxin Zang wrote:
> I found that some tasks have been running for a long enough time and
> have become illegal, but they are still not releasing the CPU. This
> will increase the scheduling delay of other processes. Therefore, I
> tried checking the current process in wakeup_preempt and entity_tick,
> and if it is illegal, reschedule that cfs queue.
>
> The modification can reduce the scheduling delay by about 30% when
> RUN_TO_PARITY is enabled.
> So far, it has been running well in my test environment, and I have
> pasted some test results below.
>
> I isolated four cores for testing. I ran Hackbench in the background
> and observed the test results of cyclictest.
>
> hackbench -g 4 -l 100000000 &
> cyclictest --mlockall -D 5m -q
>
> EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY
>
> # Min Latencies: 00006 00006 00006 00006
> LNICE(-19) # Avg Latencies: 00191 00122 00089 00066
> # Max Latencies: 15442 07648 14133 07713
>
> # Min Latencies: 00006 00010 00006 00006
> LNICE(0) # Avg Latencies: 00466 00277 00289 00257
> # Max Latencies: 38917 32391 32665 17710
>
> # Min Latencies: 00019 00053 00010 00013
> LNICE(19) # Avg Latencies: 37151 31045 18293 23035
> # Max Latencies: 2688299 7031295 426196 425708
>
> I'm actually a bit hesitant about placing this modification under the
> NO_PARITY feature. This is because the modification conflicts with the
> semantics of RUN_TO_PARITY. So, I captured and compared the number of
> resched occurrences in wakeup_preempt to see if it introduced any
> additional overhead.
>
> Similarly, hackbench is used to stress the utilization of four cores to
> 100%, and the method for capturing the number of PREEMPT occurrences is
> referenced from [1].
>
> schedstats EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY CFS(6.5)
> stats.check_preempt_count 5053054 5057286 5003806 5018589 5031908
> stats.patch_cause_preempt_count ------- 858044 ------- 765726 -------
> stats.need_preempt_count 570520 858684 3380513 3426977 1140821
>
> From the above test results, there is a slight increase in the number of
> resched occurrences in wakeup_preempt. However, the results vary with each
> test, and sometimes the difference is not that significant. But overall,
> the count of reschedules remains lower than that of CFS and is much less
> than that of NO_PARITY.
>
> [1]: https://lore.kernel.org/all/[email protected]/T/#m52057282ceb6203318be1ce9f835363de3bef5cb
>
> Signed-off-by: Chunxin Zang <[email protected]>
> Reviewed-by: Chen Yang <[email protected]>
> ---
> kernel/sched/fair.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..a0005d240db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
> return;
> #endif
> +
> + if (!entity_eligible(cfs_rq, curr))
> + resched_curr(rq_of(cfs_rq));
> }
>
>
> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> return;
>
> + if (!entity_eligible(cfs_rq, se))
> + goto preempt;
> +
> find_matching_se(&se, &pse);
> WARN_ON_ONCE(!pse);
>
Hi Chunxin,

Did you run a comparative test to see which modification is more helpful
on improve the latency? Modification at tick point makes more sense to
me. But, seems just resched arbitrarily in wakeup might introduce too
much preemption (and maybe more context switch?) in complex environment
such as cgroup hierarchy.

Thanks,
Honglei


2024-05-25 17:23:06

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On Sat, 2024-05-25 at 19:57 +0800, Chen Yu wrote:
>
> I suppose this change benefits the overloaded scenario:
> neither current nor the wakee is the best one.

Depends on your definition of benefit. It'll increasing ctx switches a
bit, but I recall it not being much.

I dug up the script I was using at the time, numbers below for the
bored. Bottom line: yeah, it's not much of a delta, especially when
comparing allegedly current EEVDF to CFS in an otherwise identical..
absolutely everything.

load: 5m chrome playing 1080p clip vs massive_intr (1 88% hog/cpu)

6.1.91-cfs
----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms |
----------------------------------------------------------------------------------------------------------
massive_intr:(9) |1897454.685 ms | 11581161 | avg: 0.026 ms | max: 43.008 ms | sum:296184.363 ms |
dav1d-worker:(8) | 94252.513 ms | 396284 | avg: 0.089 ms | max: 12.513 ms | sum:35275.546 ms |
Compositor:5824 | 36851.590 ms | 61771 | avg: 0.080 ms | max: 9.310 ms | sum: 4965.456 ms |
X:2362 | 32306.450 ms | 102571 | avg: 0.021 ms | max: 14.967 ms | sum: 2148.121 ms |
VizCompositorTh:5913 | 25116.956 ms | 56602 | avg: 0.053 ms | max: 8.441 ms | sum: 2986.101 ms |
chrome:(8) | 23134.386 ms | 85335 | avg: 0.052 ms | max: 34.540 ms | sum: 4459.871 ms |
ThreadPoolForeg:(43) | 16742.353 ms | 71410 | avg: 0.083 ms | max: 23.059 ms | sum: 5943.056 ms |
kwin_x11:2776 | 11383.572 ms | 95643 | avg: 0.017 ms | max: 8.358 ms | sum: 1589.414 ms |
VideoFrameCompo:5919 | 9589.949 ms | 37838 | avg: 0.029 ms | max: 6.842 ms | sum: 1098.123 ms |
kworker/5:1+eve:4508 | 8743.004 ms | 1647598 | avg: 0.003 ms | max: 12.002 ms | sum: 4956.587 ms |
kworker/6:2-mm_:5407 | 8686.689 ms | 1636766 | avg: 0.003 ms | max: 10.407 ms | sum: 4779.475 ms |
kworker/2:0-mm_:5707 | 8536.257 ms | 1607213 | avg: 0.003 ms | max: 9.473 ms | sum: 4776.918 ms |
kworker/4:1-mm_:379 | 8532.410 ms | 1603438 | avg: 0.003 ms | max: 10.328 ms | sum: 4824.572 ms |
kworker/1:0-eve:5409 | 8508.321 ms | 1598742 | avg: 0.003 ms | max: 13.124 ms | sum: 4742.128 ms |
perf:(2) | 5386.613 ms | 713 | avg: 0.020 ms | max: 2.268 ms | sum: 13.985 ms |
----------------------------------------------------------------------------------------------------------
TOTAL: |2242804.984 ms | 26015202 | | 43.008 ms | 416326.240 ms |
----------------------------------------------------------------------------------------------------------

6.1.91-eevdf
----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms |
----------------------------------------------------------------------------------------------------------
massive_intr:(9) |1971557.115 ms | 6127207 | avg: 0.034 ms | max: 16.351 ms | sum:208289.732 ms |
dav1d-worker:(8) | 85561.180 ms | 499175 | avg: 0.262 ms | max: 15.656 ms | sum:130584.659 ms |
Compositor:4346 | 37730.564 ms | 200925 | avg: 0.112 ms | max: 10.922 ms | sum:22406.729 ms |
X:2379 | 31761.636 ms | 229381 | avg: 0.081 ms | max: 9.740 ms | sum:18645.752 ms |
VizCompositorTh:4423 | 24650.743 ms | 155138 | avg: 0.170 ms | max: 11.227 ms | sum:26426.655 ms |
chrome:(8) | 19551.099 ms | 156680 | avg: 0.201 ms | max: 18.183 ms | sum:31449.401 ms |
ThreadPoolForeg:(43) | 15547.777 ms | 89292 | avg: 0.223 ms | max: 20.007 ms | sum:19916.046 ms |
kwin_x11:2776 | 11052.045 ms | 119945 | avg: 0.122 ms | max: 12.757 ms | sum:14687.478 ms |
VideoFrameCompo:4429 | 8794.874 ms | 76728 | avg: 0.142 ms | max: 10.183 ms | sum:10895.692 ms |
Chrome_ChildIOT:(7) | 4917.764 ms | 165906 | avg: 0.190 ms | max: 10.212 ms | sum:31461.521 ms |
Media:4428 | 3787.952 ms | 65288 | avg: 0.194 ms | max: 12.048 ms | sum:12662.386 ms |
kworker/6:1-eve:135 | 3359.276 ms | 616547 | avg: 0.009 ms | max: 7.999 ms | sum: 5762.212 ms |
kworker/4:1-eve:365 | 3144.292 ms | 578287 | avg: 0.009 ms | max: 7.619 ms | sum: 5322.637 ms |
kworker/3:2-eve:297 | 3104.034 ms | 557150 | avg: 0.013 ms | max: 8.006 ms | sum: 7050.461 ms |
perf:(2) | 3098.480 ms | 1585 | avg: 0.102 ms | max: 5.470 ms | sum: 160.995 ms |
----------------------------------------------------------------------------------------------------------
TOTAL: |2271694.585 ms | 16259483 | | 32.144 ms | 669428.151 ms |
----------------------------------------------------------------------------------------------------------

+tweak
----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms |
----------------------------------------------------------------------------------------------------------
massive_intr:(9) |1965603.161 ms | 6284089 | avg: 0.034 ms | max: 16.005 ms | sum:214120.602 ms |
dav1d-worker:(8) | 89853.413 ms | 599733 | avg: 0.240 ms | max: 48.387 ms | sum:144117.080 ms |
Compositor:4342 | 36473.771 ms | 171986 | avg: 0.129 ms | max: 11.366 ms | sum:22135.405 ms |
X:2365 | 32167.915 ms | 218157 | avg: 0.088 ms | max: 9.841 ms | sum:19105.816 ms |
VizCompositorTh:4425 | 24338.749 ms | 151884 | avg: 0.181 ms | max: 11.755 ms | sum:27553.783 ms |
chrome:(8) | 20154.023 ms | 158554 | avg: 0.207 ms | max: 15.979 ms | sum:32742.291 ms |
ThreadPoolForeg:(45) | 15672.931 ms | 94051 | avg: 0.215 ms | max: 17.452 ms | sum:20185.561 ms |
kwin_x11:2773 | 11424.789 ms | 121491 | avg: 0.140 ms | max: 11.116 ms | sum:16958.020 ms |
VideoFrameCompo:4431 | 8869.431 ms | 82385 | avg: 0.139 ms | max: 10.906 ms | sum:11471.193 ms |
Chrome_ChildIOT:(7) | 5148.973 ms | 167824 | avg: 0.189 ms | max: 13.755 ms | sum:31640.759 ms |
kworker/7:1-eve:86 | 4258.124 ms | 784269 | avg: 0.009 ms | max: 8.228 ms | sum: 6780.999 ms |
Media:4430 | 3897.705 ms | 62985 | avg: 0.205 ms | max: 10.797 ms | sum:12904.412 ms |
kworker/6:1-eve:189 | 3608.493 ms | 663349 | avg: 0.009 ms | max: 7.902 ms | sum: 6034.231 ms |
kworker/5:2-eve:827 | 3309.865 ms | 611424 | avg: 0.009 ms | max: 7.112 ms | sum: 5552.591 ms |
perf:(2) | 3241.897 ms | 1847 | avg: 0.087 ms | max: 5.464 ms | sum: 160.383 ms |
----------------------------------------------------------------------------------------------------------
TOTAL: |2272683.607 ms | 16721925 | | 57.181 ms | 692810.431 ms |
----------------------------------------------------------------------------------------------------------
hohum
+peterz queue w. RUN_TO_PARITY
----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms |
----------------------------------------------------------------------------------------------------------
massive_intr:(9) |1972481.970 ms | 4989513 | avg: 0.042 ms | max: 20.019 ms | sum:208651.087 ms |
dav1d-worker:(8) | 85235.372 ms | 528422 | avg: 0.254 ms | max: 15.253 ms | sum:134274.493 ms |
Compositor:4343 | 36977.626 ms | 154214 | avg: 0.122 ms | max: 9.868 ms | sum:18854.543 ms |
X:2359 | 31873.877 ms | 187392 | avg: 0.094 ms | max: 10.100 ms | sum:17644.947 ms |
VizCompositorTh:4427 | 24881.223 ms | 120412 | avg: 0.176 ms | max: 14.813 ms | sum:21210.898 ms |
chrome:(8) | 21579.151 ms | 133086 | avg: 0.200 ms | max: 12.952 ms | sum:26600.419 ms |
ThreadPoolForeg:(45) | 15327.978 ms | 94395 | avg: 0.196 ms | max: 35.000 ms | sum:18547.639 ms |
kwin_x11:2776 | 11232.090 ms | 121392 | avg: 0.135 ms | max: 10.313 ms | sum:16426.213 ms |
VideoFrameCompo:4433 | 8858.806 ms | 65658 | avg: 0.144 ms | max: 11.409 ms | sum: 9485.191 ms |
Chrome_ChildIOT:(7) | 4970.611 ms | 172570 | avg: 0.142 ms | max: 11.008 ms | sum:24467.160 ms |
Media:4432 | 3781.277 ms | 63640 | avg: 0.162 ms | max: 10.096 ms | sum:10283.264 ms |
kworker/7:1-eve:91 | 2930.823 ms | 534857 | avg: 0.009 ms | max: 8.234 ms | sum: 4723.577 ms |
kworker/6:2-eve:356 | 2579.393 ms | 472864 | avg: 0.009 ms | max: 8.046 ms | sum: 4148.828 ms |
perf:(2) | 2569.531 ms | 1609 | avg: 0.101 ms | max: 5.966 ms | sum: 163.224 ms |
kworker/4:0-eve:40 | 2432.133 ms | 442300 | avg: 0.009 ms | max: 9.475 ms | sum: 3992.979 ms |
----------------------------------------------------------------------------------------------------------
TOTAL: |2263072.188 ms | 12993836 | | 35.000 ms | 601609.374 ms |
----------------------------------------------------------------------------------------------------------
marko?
+NO_DELAY_DEQUEUE
----------------------------------------------------------------------------------------------------------
Task | Runtime ms | Switches | Avg delay ms | Max delay ms | Sum delay ms |
----------------------------------------------------------------------------------------------------------
massive_intr:(9) |1968212.427 ms | 6050894 | avg: 0.035 ms | max: 20.032 ms | sum:213163.997 ms |
dav1d-worker:(8) | 86929.255 ms | 583692 | avg: 0.246 ms | max: 14.986 ms | sum:143561.571 ms |
Compositor:4933 | 36733.711 ms | 219265 | avg: 0.100 ms | max: 14.986 ms | sum:21888.378 ms |
X:2359 | 31624.338 ms | 233581 | avg: 0.074 ms | max: 8.629 ms | sum:17324.762 ms |
VizCompositorTh:5018 | 24597.941 ms | 179049 | avg: 0.147 ms | max: 11.717 ms | sum:26333.576 ms |
chrome:(8) | 20430.046 ms | 179393 | avg: 0.173 ms | max: 20.903 ms | sum:30976.208 ms |
ThreadPoolForeg:(39) | 15423.142 ms | 109837 | avg: 0.183 ms | max: 24.525 ms | sum:20115.906 ms |
kwin_x11:2776 | 11413.866 ms | 129426 | avg: 0.121 ms | max: 10.718 ms | sum:15719.900 ms |
VideoFrameCompo:5023 | 8817.956 ms | 78028 | avg: 0.130 ms | max: 18.471 ms | sum:10162.602 ms |
Chrome_ChildIOT:(7) | 5356.461 ms | 187001 | avg: 0.160 ms | max: 11.565 ms | sum:29969.033 ms |
Media:5022 | 3793.341 ms | 64887 | avg: 0.186 ms | max: 13.229 ms | sum:12096.948 ms |
kworker/6:0-eve:5052 | 3509.228 ms | 643562 | avg: 0.010 ms | max: 8.005 ms | sum: 6305.605 ms |
kworker/3:0-eve:34 | 3363.538 ms | 598417 | avg: 0.012 ms | max: 8.892 ms | sum: 6910.297 ms |
perf:(2) | 3167.463 ms | 1835 | avg: 0.090 ms | max: 5.039 ms | sum: 164.352 ms |
kworker/4:2+eve:4808 | 3002.682 ms | 549210 | avg: 0.010 ms | max: 8.622 ms | sum: 5400.444 ms |
----------------------------------------------------------------------------------------------------------
TOTAL: |2270484.307 ms | 16315986 | | 24.525 ms | 677870.230 ms |
----------------------------------------------------------------------------------------------------------
polo


2024-05-27 08:05:35

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On Sat, May 25, 2024 at 08:41:28AM +0200, Mike Galbraith wrote:

> - if (pick_eevdf(cfs_rq) == pse)
> - goto preempt;
> -
> - return;
> + if (pick_eevdf(cfs_rq) == se)
> + return;

Right, this will preempt more.

This is probably going to make Prateek's case worse though. Then again,
I was already leaning to towards not making his stronger slice
protection default, because it simply hurts too much elsewhere.

Still, his observation that placing tasks can move V left which in turn
can cause the just scheduled in current non-eligible and cause
over-scheduling is valid -- just not sure what to do about it yet.

2024-05-27 09:54:27

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On Mon, 2024-05-27 at 10:05 +0200, Peter Zijlstra wrote:
> On Sat, May 25, 2024 at 08:41:28AM +0200, Mike Galbraith wrote:
>
> > -       if (pick_eevdf(cfs_rq) == pse)
> > -               goto preempt;
> > -
> > -       return;
> > +       if (pick_eevdf(cfs_rq) == se)
> > +               return;
>
> Right, this will preempt more.

Yeah, and for no tangible benefit that I can see. Repeating the mixed
load GUI vs compute testing a bunch of times, there's enough variance
to swamp any signal.

-Mike

2024-05-28 02:43:18

by Chunxin Zang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible


> On May 24, 2024, at 23:30, Chen Yu <[email protected]> wrote:
>
> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>> I found that some tasks have been running for a long enough time and
>> have become illegal, but they are still not releasing the CPU. This
>> will increase the scheduling delay of other processes. Therefore, I
>> tried checking the current process in wakeup_preempt and entity_tick,
>> and if it is illegal, reschedule that cfs queue.
>>
>> The modification can reduce the scheduling delay by about 30% when
>> RUN_TO_PARITY is enabled.
>> So far, it has been running well in my test environment, and I have
>> pasted some test results below.
>>
>
> Interesting, besides hackbench, I assume that you have workload in
> real production environment that is sensitive to wakeup latency?

Hi Chen

Yes, my workload are quite sensitive to wakeup latency .
>
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..a0005d240db5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>> return;
>> #endif
>> +
>> + if (!entity_eligible(cfs_rq, curr))
>> + resched_curr(rq_of(cfs_rq));
>> }
>>
>
> entity_tick() -> update_curr() -> update_deadline():
> se->vruntime >= se->deadline ? resched_curr()
> only current has expired its slice will it be scheduled out.
>
> So here you want to schedule current out if its lag becomes 0.
>
> In lastest sched/eevdf branch, it is controlled by two sched features:
> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>
> Maybe something like this can achieve your goal
> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
> resched_curr
>
>>
>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>> return;
>>
>> + if (!entity_eligible(cfs_rq, se))
>> + goto preempt;
>> +
>
> Not sure if this is applicable, later in this function, pick_eevdf() checks
> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
> be evicted. And this change does not consider the cgroup hierarchy.
>
> Besides, the check of current eligiblity can get false negative result,
> if the enqueued entity has a positive lag. Prateek proposed to
> remove the check of current's eligibility in pick_eevdf():
> https://lore.kernel.org/lkml/[email protected]/

Thank you for letting me know about Peter's latest updates and thoughts.
Actually, the original intention of my modification was to minimize the
traversal of the rb-tree as much as possible. For example, in the following
scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
resched, the scheduler will call 'pick_eevdf' again, traversing the
rb-tree once more. This ultimately results in the rb-tree being traversed
twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
and directly trigger a 'resched', it would reduce the traversal of the rb-tree
by one time.


wakeup_preempt-> pick_eevdf -> resched_curr
|->'traverse the rb-tree' |
schedule->pick_eevdf
|->'traverse the rb-tree'


Of course, this would break the semantics of RESPECT_SLICE as well as
RUN_TO_PARITY. So, this might be considered a performance enhancement
for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.

thanks
Chunxin


> If I understand your requirement correctly, you want to reduce the wakeup
> latency. There are some codes under developed by Peter, which could
> customized task's wakeup latency via setting its slice:
> https://lore.kernel.org/lkml/[email protected]/
>
> thanks,
> Chenyu









2024-05-28 05:02:56

by K Prateek Nayak

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Hello Chunxin,

On 5/28/2024 8:12 AM, Chunxin Zang wrote:
>
>> On May 24, 2024, at 23:30, Chen Yu <[email protected]> wrote:
>>
>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>> I found that some tasks have been running for a long enough time and
>>> have become illegal, but they are still not releasing the CPU. This
>>> will increase the scheduling delay of other processes. Therefore, I
>>> tried checking the current process in wakeup_preempt and entity_tick,
>>> and if it is illegal, reschedule that cfs queue.
>>>
>>> The modification can reduce the scheduling delay by about 30% when
>>> RUN_TO_PARITY is enabled.
>>> So far, it has been running well in my test environment, and I have
>>> pasted some test results below.
>>>
>>
>> Interesting, besides hackbench, I assume that you have workload in
>> real production environment that is sensitive to wakeup latency?
>
> Hi Chen
>
> Yes, my workload are quite sensitive to wakeup latency .
>>
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>> return;
>>> #endif
>>> +
>>> + if (!entity_eligible(cfs_rq, curr))
>>> + resched_curr(rq_of(cfs_rq));
>>> }
>>>
>>
>> entity_tick() -> update_curr() -> update_deadline():
>> se->vruntime >= se->deadline ? resched_curr()
>> only current has expired its slice will it be scheduled out.
>>
>> So here you want to schedule current out if its lag becomes 0.
>>
>> In lastest sched/eevdf branch, it is controlled by two sched features:
>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>
>> Maybe something like this can achieve your goal
>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>> resched_curr
>>
>>>
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>>
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>
>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>> be evicted. And this change does not consider the cgroup hierarchy.

The above line will be referred to as [1] below.

>>
>> Besides, the check of current eligiblity can get false negative result,
>> if the enqueued entity has a positive lag. Prateek proposed to
>> remove the check of current's eligibility in pick_eevdf():
>> https://lore.kernel.org/lkml/[email protected]/
>
> Thank you for letting me know about Peter's latest updates and thoughts.
> Actually, the original intention of my modification was to minimize the
> traversal of the rb-tree as much as possible. For example, in the following
> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> 'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
> resched, the scheduler will call 'pick_eevdf' again, traversing the
> rb-tree once more. This ultimately results in the rb-tree being traversed
> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> by one time.
>
>
> wakeup_preempt-> pick_eevdf -> resched_curr
> |->'traverse the rb-tree' |
> schedule->pick_eevdf
> |->'traverse the rb-tree'

I see what you mean but a couple of things:

(I'm adding the check_preempt_wakeup_fair() hunk from the original patch
below for ease of interpretation)

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 03be0d1330a6..a0005d240db5 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> return;
>
> + if (!entity_eligible(cfs_rq, se))
> + goto preempt;
> +

This check uses the root cfs_rq since "task_cfs_rq()" returns the
"rq->cfs" of the runqueue the task is on. In presence of cgroups or
CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
on a higher order cfs_rq and this entity_eligible() calculation might
not be valid since the vruntime calculation for the "se" is relative to
the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
I believe that is what Chenyu was referring to in [1].

> find_matching_se(&se, &pse);
> WARN_ON_ONCE(!pse);
>
> --

In addition to that, There is an update_curr() call below for the first
cfs_rq where both the entities' hierarchy is queued which is found by
find_matching_se(). I believe that is required too to update the
vruntime and deadline of the entity where preemption can happen.

If you want to circumvent a second call to pick_eevdf(), could you
perhaps do:

(Only build tested)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9eb63573110c..653b1bee1e62 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
update_curr(cfs_rq);

/*
- * XXX pick_eevdf(cfs_rq) != se ?
+ * If the hierarchy of current task is ineligible at the common
+ * point on the newly woken entity, there is a good chance of
+ * wakeup preemption by the newly woken entity. Mark for resched
+ * and allow pick_eevdf() in schedule() to judge which task to
+ * run next.
*/
- if (pick_eevdf(cfs_rq) == pse)
+ if (!entity_eligible(cfs_rq, se))
goto preempt;

return;

--

There are other implications here which is specifically highlighted by
the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
entity is not the entity with the earliest eligible virtual deadline,
the current task is still preempted if any other entity has the EEVD.

Mike's box gave switching to above two thumbs up; I have to check what
my box says :)

Following are DeathStarBench results with your original patch compared
to v6.9-rc5 based tip:sched/core:

==================================================================
Test : DeathStarBench
Why? : Some tasks here do no like aggressive preemption
Units : Normalized throughput
Interpretation: Higher is better
Statistic : Mean
==================================================================
Pinning scaling tip eager_preempt (pct imp)
1CCD 1 1.00 0.99 (%diff: -1.13%)
2CCD 2 1.00 0.97 (%diff: -3.21%)
4CCD 3 1.00 0.97 (%diff: -3.41%)
8CCD 6 1.00 0.97 (%diff: -3.20%)
--

I'll give the variants mentioned in the thread a try too to see if
some of my assumptions around heavy preemption hold good. I was also
able to dig up an old patch by Balakumaran Kannan which skipped
pick_eevdf() altogether if "pse" is ineligible which also seems like
a good optimization based on current check in
check_preempt_wakeup_fair() but it perhaps doesn't help the case of
wakeup-latency sensitivity you are optimizing for; only reduces
rb-tree traversal if there is no chance of pick_eevdf() returning "pse"
https://lore.kernel.org/lkml/[email protected]/

--
Thanks and Regards,
Prateek

>
>
> Of course, this would break the semantics of RESPECT_SLICE as well as
> RUN_TO_PARITY. So, this might be considered a performance enhancement
> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>
> thanks
> Chunxin
>
>
>> If I understand your requirement correctly, you want to reduce the wakeup
>> latency. There are some codes under developed by Peter, which could
>> customized task's wakeup latency via setting its slice:
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> thanks,
>> Chenyu


2024-05-28 06:42:02

by Chunxin Zang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



> On May 28, 2024, at 10:42, Chunxin Zang <[email protected]> wrote:
>
>>
>> On May 24, 2024, at 23:30, Chen Yu <[email protected]> wrote:
>>
>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>> I found that some tasks have been running for a long enough time and
>>> have become illegal, but they are still not releasing the CPU. This
>>> will increase the scheduling delay of other processes. Therefore, I
>>> tried checking the current process in wakeup_preempt and entity_tick,
>>> and if it is illegal, reschedule that cfs queue.
>>>
>>> The modification can reduce the scheduling delay by about 30% when
>>> RUN_TO_PARITY is enabled.
>>> So far, it has been running well in my test environment, and I have
>>> pasted some test results below.
>>>
>>
>> Interesting, besides hackbench, I assume that you have workload in
>> real production environment that is sensitive to wakeup latency?
>
> Hi Chen
>
> Yes, my workload are quite sensitive to wakeup latency .
>>
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>> return;
>>> #endif
>>> +
>>> + if (!entity_eligible(cfs_rq, curr))
>>> + resched_curr(rq_of(cfs_rq));
>>> }
>>>
>>
>> entity_tick() -> update_curr() -> update_deadline():
>> se->vruntime >= se->deadline ? resched_curr()
>> only current has expired its slice will it be scheduled out.
>>
>> So here you want to schedule current out if its lag becomes 0.
>>
>> In lastest sched/eevdf branch, it is controlled by two sched features:
>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>
>> Maybe something like this can achieve your goal
>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>> resched_curr
>>
>>>
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>>
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>
>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>> be evicted. And this change does not consider the cgroup hierarchy.
>>
>> Besides, the check of current eligiblity can get false negative result,
>> if the enqueued entity has a positive lag. Prateek proposed to
>> remove the check of current's eligibility in pick_eevdf():
>> https://lore.kernel.org/lkml/[email protected]/
>
> Thank you for letting me know about Peter's latest updates and thoughts.
> Actually, the original intention of my modification was to minimize the
> traversal of the rb-tree as much as possible. For example, in the following
> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> 'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
> resched, the scheduler will call 'pick_eevdf' again, traversing the
> rb-tree once more. This ultimately results in the rb-tree being traversed
> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> by one time.
>
>
> wakeup_preempt-> pick_eevdf -> resched_curr
> |->'traverse the rb-tree' |
> schedule->pick_eevdf
> |->'traverse the rb-tree'
>
>
> Of course, this would break the semantics of RESPECT_SLICE as well as
> RUN_TO_PARITY. So, this might be considered a performance enhancement
> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>
Sorry for the mistake. I mean it should be a performance enhancement for scenarios
with NO_RESPECT_SLICE/NO_RUN_TO_PARITY.

Maybe it should be like this

@@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
return;

+ if (!sched_feat(RESPECT_SLICE) && !sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))
+ goto preempt;
+

> thanks
> Chunxin
>
>
>> If I understand your requirement correctly, you want to reduce the wakeup
>> latency. There are some codes under developed by Peter, which could
>> customized task's wakeup latency via setting its slice:
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> thanks,
>> Chenyu



2024-05-28 07:19:30

by Chunxin Zang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Hi Prateek

> On May 28, 2024, at 13:02, K Prateek Nayak <[email protected]> wrote:
>
> Hello Chunxin,
>
> On 5/28/2024 8:12 AM, Chunxin Zang wrote:
>>
>>> On May 24, 2024, at 23:30, Chen Yu <[email protected]> wrote:
>>>
>>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>>> I found that some tasks have been running for a long enough time and
>>>> have become illegal, but they are still not releasing the CPU. This
>>>> will increase the scheduling delay of other processes. Therefore, I
>>>> tried checking the current process in wakeup_preempt and entity_tick,
>>>> and if it is illegal, reschedule that cfs queue.
>>>>
>>>> The modification can reduce the scheduling delay by about 30% when
>>>> RUN_TO_PARITY is enabled.
>>>> So far, it has been running well in my test environment, and I have
>>>> pasted some test results below.
>>>>
>>>
>>> Interesting, besides hackbench, I assume that you have workload in
>>> real production environment that is sensitive to wakeup latency?
>>
>> Hi Chen
>>
>> Yes, my workload are quite sensitive to wakeup latency .
>>>
>>>>
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 03be0d1330a6..a0005d240db5 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>>> return;
>>>> #endif
>>>> +
>>>> + if (!entity_eligible(cfs_rq, curr))
>>>> + resched_curr(rq_of(cfs_rq));
>>>> }
>>>>
>>>
>>> entity_tick() -> update_curr() -> update_deadline():
>>> se->vruntime >= se->deadline ? resched_curr()
>>> only current has expired its slice will it be scheduled out.
>>>
>>> So here you want to schedule current out if its lag becomes 0.
>>>
>>> In lastest sched/eevdf branch, it is controlled by two sched features:
>>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>>
>>> Maybe something like this can achieve your goal
>>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>>> resched_curr
>>>
>>>>
>>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>>> return;
>>>>
>>>> + if (!entity_eligible(cfs_rq, se))
>>>> + goto preempt;
>>>> +
>>>
>>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>>> be evicted. And this change does not consider the cgroup hierarchy.
>
> The above line will be referred to as [1] below.
>
>>>
>>> Besides, the check of current eligiblity can get false negative result,
>>> if the enqueued entity has a positive lag. Prateek proposed to
>>> remove the check of current's eligibility in pick_eevdf():
>>> https://lore.kernel.org/lkml/[email protected]/
>>
>> Thank you for letting me know about Peter's latest updates and thoughts.
>> Actually, the original intention of my modification was to minimize the
>> traversal of the rb-tree as much as possible. For example, in the following
>> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
>> 'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
>> resched, the scheduler will call 'pick_eevdf' again, traversing the
>> rb-tree once more. This ultimately results in the rb-tree being traversed
>> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
>> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
>> by one time.
>>
>>
>> wakeup_preempt-> pick_eevdf -> resched_curr
>> |->'traverse the rb-tree' |
>> schedule->pick_eevdf
>> |->'traverse the rb-tree'
>
> I see what you mean but a couple of things:
>
> (I'm adding the check_preempt_wakeup_fair() hunk from the original patch
> below for ease of interpretation)
>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 03be0d1330a6..a0005d240db5 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>> return;
>>
>> + if (!entity_eligible(cfs_rq, se))
>> + goto preempt;
>> +
>
> This check uses the root cfs_rq since "task_cfs_rq()" returns the
> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
> on a higher order cfs_rq and this entity_eligible() calculation might
> not be valid since the vruntime calculation for the "se" is relative to
> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
> I believe that is what Chenyu was referring to in [1].


Thank you for explaining so much to me; I am trying to understand all of this. :)

>
>> find_matching_se(&se, &pse);
>> WARN_ON_ONCE(!pse);
>>
>> --
>
> In addition to that, There is an update_curr() call below for the first
> cfs_rq where both the entities' hierarchy is queued which is found by
> find_matching_se(). I believe that is required too to update the
> vruntime and deadline of the entity where preemption can happen.
>
> If you want to circumvent a second call to pick_eevdf(), could you
> perhaps do:
>
> (Only build tested)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9eb63573110c..653b1bee1e62 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> update_curr(cfs_rq);
>
> /*
> - * XXX pick_eevdf(cfs_rq) != se ?
> + * If the hierarchy of current task is ineligible at the common
> + * point on the newly woken entity, there is a good chance of
> + * wakeup preemption by the newly woken entity. Mark for resched
> + * and allow pick_eevdf() in schedule() to judge which task to
> + * run next.
> */
> - if (pick_eevdf(cfs_rq) == pse)
> + if (!entity_eligible(cfs_rq, se))
> goto preempt;
>
> return;
>
> --
>
> There are other implications here which is specifically highlighted by
> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
> entity is not the entity with the earliest eligible virtual deadline,
> the current task is still preempted if any other entity has the EEVD.
>
> Mike's box gave switching to above two thumbs up; I have to check what
> my box says :)
>
> Following are DeathStarBench results with your original patch compared
> to v6.9-rc5 based tip:sched/core:
>
> ==================================================================
> Test : DeathStarBench
> Why? : Some tasks here do no like aggressive preemption
> Units : Normalized throughput
> Interpretation: Higher is better
> Statistic : Mean
> ==================================================================
> Pinning scaling tip eager_preempt (pct imp)
> 1CCD 1 1.00 0.99 (%diff: -1.13%)
> 2CCD 2 1.00 0.97 (%diff: -3.21%)
> 4CCD 3 1.00 0.97 (%diff: -3.41%)
> 8CCD 6 1.00 0.97 (%diff: -3.20%)
> --

Please forgive me as I have not used the DeathStarBench suite before. Does
this test result indicate that my modifications have resulted in tasks that do no
like aggressive preemption being even less likely to be preempted?

thanks
Chunxin

> I'll give the variants mentioned in the thread a try too to see if
> some of my assumptions around heavy preemption hold good. I was also
> able to dig up an old patch by Balakumaran Kannan which skipped
> pick_eevdf() altogether if "pse" is ineligible which also seems like
> a good optimization based on current check in
> check_preempt_wakeup_fair() but it perhaps doesn't help the case of
> wakeup-latency sensitivity you are optimizing for; only reduces
> rb-tree traversal if there is no chance of pick_eevdf() returning "pse"
> https://lore.kernel.org/lkml/[email protected]/
>
> --
> Thanks and Regards,
> Prateek
>
>>
>>
>> Of course, this would break the semantics of RESPECT_SLICE as well as
>> RUN_TO_PARITY. So, this might be considered a performance enhancement
>> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>>
>> thanks
>> Chunxin
>>
>>
>>> If I understand your requirement correctly, you want to reduce the wakeup
>>> latency. There are some codes under developed by Peter, which could
>>> customized task's wakeup latency via setting its slice:
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> thanks,
>>> Chenyu



2024-05-28 07:47:50

by K Prateek Nayak

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Hello Chunxin,

On 5/28/2024 12:48 PM, Chunxin Zang wrote:
> [..snip..]
>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>>
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>
>> This check uses the root cfs_rq since "task_cfs_rq()" returns the
>> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
>> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
>> on a higher order cfs_rq and this entity_eligible() calculation might
>> not be valid since the vruntime calculation for the "se" is relative to
>> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
>> I believe that is what Chenyu was referring to in [1].
>
>
> Thank you for explaining so much to me; I am trying to understand all of this. :)
>
>>
>>> find_matching_se(&se, &pse);
>>> WARN_ON_ONCE(!pse);
>>>
>>> --
>>
>> In addition to that, There is an update_curr() call below for the first
>> cfs_rq where both the entities' hierarchy is queued which is found by
>> find_matching_se(). I believe that is required too to update the
>> vruntime and deadline of the entity where preemption can happen.
>>
>> If you want to circumvent a second call to pick_eevdf(), could you
>> perhaps do:
>>
>> (Only build tested)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9eb63573110c..653b1bee1e62 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> update_curr(cfs_rq);
>>
>> /*
>> - * XXX pick_eevdf(cfs_rq) != se ?
>> + * If the hierarchy of current task is ineligible at the common
>> + * point on the newly woken entity, there is a good chance of
>> + * wakeup preemption by the newly woken entity. Mark for resched
>> + * and allow pick_eevdf() in schedule() to judge which task to
>> + * run next.
>> */
>> - if (pick_eevdf(cfs_rq) == pse)
>> + if (!entity_eligible(cfs_rq, se))
>> goto preempt;
>>
>> return;
>>
>> --
>>
>> There are other implications here which is specifically highlighted by
>> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
>> entity is not the entity with the earliest eligible virtual deadline,
>> the current task is still preempted if any other entity has the EEVD.
>>
>> Mike's box gave switching to above two thumbs up; I have to check what
>> my box says :)
>>
>> Following are DeathStarBench results with your original patch compared
>> to v6.9-rc5 based tip:sched/core:
>>
>> ==================================================================
>> Test : DeathStarBench
>> Why? : Some tasks here do no like aggressive preemption
>> Units : Normalized throughput
>> Interpretation: Higher is better
>> Statistic : Mean
>> ==================================================================
>> Pinning scaling tip eager_preempt (pct imp)
>> 1CCD 1 1.00 0.99 (%diff: -1.13%)
>> 2CCD 2 1.00 0.97 (%diff: -3.21%)
>> 4CCD 3 1.00 0.97 (%diff: -3.41%)
>> 8CCD 6 1.00 0.97 (%diff: -3.20%)
>> --
>
> Please forgive me as I have not used the DeathStarBench suite before. Does
> this test result indicate that my modifications have resulted in tasks that do no
> like aggressive preemption being even less likely to be preempted?

It is actually the opposite. In case of DeathStarBench, the nginx server
tasks responsible for being the entrypoint into the microservice chain
do not like to be preempted. A regression generally indicates that these
tasks have very likely been preempted as a result of which the throughput
drops. More information for DeathStarBench and the problem is highlighted
in https://lore.kernel.org/lkml/[email protected]/

I'll test with more workloads later today and update the thread. Please
forgive for any delay, I'm slowly crawling through a backlog of
testing.

--
Thanks and Regards,
Prateek

>
> thanks
> Chunxin
>
>> I'll give the variants mentioned in the thread a try too to see if
>> some of my assumptions around heavy preemption hold good. I was also
>> able to dig up an old patch by Balakumaran Kannan which skipped
>> pick_eevdf() altogether if "pse" is ineligible which also seems like
>> a good optimization based on current check in
>> check_preempt_wakeup_fair() but it perhaps doesn't help the case of
>> wakeup-latency sensitivity you are optimizing for; only reduces
>> rb-tree traversal if there is no chance of pick_eevdf() returning "pse"
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> [..snip..]
>>

2024-05-29 06:28:08

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



Hello,

kernel test robot noticed a -11.8% regression of netperf.Throughput_Mbps on:


commit: e2bbd1c498980c5cb68f9973f418ae09f353258d ("[PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible")
url: https://github.com/intel-lab-lkp/linux/commits/Chunxin-Zang/sched-fair-Reschedule-the-cfs_rq-when-current-is-ineligible/20240524-214314
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 97450eb909658573dcacc1063b06d3d08642c0c1
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

testcase: netperf
test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory
parameters:

ip: ipv4
runtime: 300s
nr_threads: 200%
cluster: cs-localhost
test: UDP_STREAM
cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | stress-ng: stress-ng.fstat.ops_per_sec -3.9% regression |
| test machine | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
| test parameters | cpufreq_governor=performance |
| | disk=1HDD |
| | fs=ext4 |
| | nr_threads=100% |
| | test=fstat |
| | testtime=60s |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | aim7: aim7.jobs-per-min 9.6% improvement |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters | cpufreq_governor=performance |
| | disk=4BRD_12G |
| | fs=xfs |
| | load=300 |
| | md=RAID1 |
| | test=sync_disk_rw |
+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | kbuild: kbuild.user_time_per_iteration 2.3% regression |
| test machine | 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory |
| test parameters | build_kconfig=defconfig |
| | cpufreq_governor=performance |
| | nr_task=200% |
| | runtime=300s |
| | target=vmlinux |
+------------------+----------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240529/[email protected]

=========================================================================================
cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase:
cs-localhost/gcc-13/performance/ipv4/x86_64-rhel-8.3/200%/debian-12-x86_64-20240206.cgz/300s/lkp-icl-2sp2/UDP_STREAM/netperf

commit:
97450eb909 ("sched/pelt: Remove shift of thermal clock")
e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41
---------------- ---------------------------
%stddev %change %stddev
\ | \
0.46 ? 5% +0.1 0.56 ? 4% mpstat.cpu.all.irq%
1867628 ? 2% -12.6% 1632289 ? 8% meminfo.Active
1867580 ? 2% -12.6% 1632257 ? 8% meminfo.Active(anon)
1865825 ? 2% -12.7% 1629647 ? 9% numa-meminfo.node1.Active
1865809 ? 2% -12.7% 1629633 ? 9% numa-meminfo.node1.Active(anon)
68.00 ? 8% +136.3% 160.67 ? 18% perf-c2c.DRAM.local
2951 ? 9% +98.5% 5858 perf-c2c.DRAM.remote
7054758 -5.6% 6656686 vmstat.system.cs
192398 -9.7% 173722 vmstat.system.in
1.632e+09 -10.7% 1.458e+09 numa-numastat.node0.local_node
1.633e+09 -10.7% 1.458e+09 numa-numastat.node0.numa_hit
1.632e+09 -11.4% 1.446e+09 numa-numastat.node1.local_node
1.633e+09 -11.4% 1.447e+09 numa-numastat.node1.numa_hit
1.633e+09 -10.7% 1.458e+09 numa-vmstat.node0.numa_hit
1.632e+09 -10.7% 1.458e+09 numa-vmstat.node0.numa_local
466378 ? 2% -12.6% 407484 ? 8% numa-vmstat.node1.nr_active_anon
466377 ? 2% -12.6% 407484 ? 8% numa-vmstat.node1.nr_zone_active_anon
1.633e+09 -11.4% 1.447e+09 numa-vmstat.node1.numa_hit
1.632e+09 -11.4% 1.446e+09 numa-vmstat.node1.numa_local
467142 ? 3% -12.7% 407846 ? 9% proc-vmstat.nr_active_anon
31481 +2.0% 32110 proc-vmstat.nr_kernel_stack
467142 ? 3% -12.7% 407846 ? 9% proc-vmstat.nr_zone_active_anon
3.266e+09 -11.0% 2.905e+09 proc-vmstat.numa_hit
3.264e+09 -11.0% 2.904e+09 proc-vmstat.numa_local
2.608e+10 -11.0% 2.32e+10 proc-vmstat.pgalloc_normal
2.608e+10 -11.0% 2.32e+10 proc-vmstat.pgfree
29563 -10.1% 26584 netperf.ThroughputBoth_Mbps
7563274 -10.3% 6783505 netperf.ThroughputBoth_total_Mbps
7788 -5.1% 7388 netperf.ThroughputRecv_Mbps
1992482 -5.4% 1885347 netperf.ThroughputRecv_total_Mbps
21775 -11.8% 19196 netperf.Throughput_Mbps
5570791 -12.1% 4898158 netperf.Throughput_total_Mbps
1.083e+09 -5.3% 1.025e+09 netperf.time.involuntary_context_switches
8403 -3.4% 8116 netperf.time.percent_of_cpu_this_job_got
24883 -3.5% 24000 netperf.time.system_time
789.48 +1.2% 799.09 netperf.time.user_time
4.33e+09 -10.3% 3.883e+09 netperf.workload
4.31 ? 4% +11.6% 4.81 ? 4% sched_debug.cfs_rq:/.h_nr_running.max
0.68 ? 3% +12.6% 0.77 sched_debug.cfs_rq:/.h_nr_running.stddev
16.51 ? 12% +19.3% 19.70 ? 6% sched_debug.cfs_rq:/.load_avg.avg
5.04 ? 34% +60.0% 8.07 ? 17% sched_debug.cfs_rq:/.removed.load_avg.avg
28.02 ? 21% +27.6% 35.75 ? 7% sched_debug.cfs_rq:/.removed.load_avg.stddev
2.48 ? 35% +48.3% 3.68 ? 9% sched_debug.cfs_rq:/.removed.runnable_avg.avg
2.48 ? 35% +48.3% 3.68 ? 9% sched_debug.cfs_rq:/.removed.util_avg.avg
114.64 ? 8% -10.3% 102.79 ? 4% sched_debug.cfs_rq:/.util_avg.stddev
36.81 ? 10% +50.7% 55.47 ? 11% sched_debug.cpu.clock.stddev
0.00 ? 6% +43.3% 0.00 ? 9% sched_debug.cpu.next_balance.stddev
4.31 ? 4% +10.3% 4.75 ? 3% sched_debug.cpu.nr_running.max
0.68 ? 3% +12.4% 0.76 ? 2% sched_debug.cpu.nr_running.stddev
7177076 ? 2% -9.9% 6466454 ? 4% sched_debug.cpu.nr_switches.min
0.23 ? 88% +290.5% 0.92 ? 24% sched_debug.rt_rq:.rt_time.avg
30.05 ? 88% +290.5% 117.32 ? 24% sched_debug.rt_rq:.rt_time.max
2.65 ? 88% +290.5% 10.33 ? 24% sched_debug.rt_rq:.rt_time.stddev
1.39 ? 3% +232.7% 4.63 perf-stat.i.MPKI
2.345e+10 -9.9% 2.113e+10 perf-stat.i.branch-instructions
1.05 +0.0 1.09 perf-stat.i.branch-miss-rate%
2.419e+08 -6.1% 2.27e+08 perf-stat.i.branch-misses
4.04 ? 3% +5.9 9.96 perf-stat.i.cache-miss-rate%
1.769e+08 ? 3% +200.3% 5.312e+08 perf-stat.i.cache-misses
4.43e+09 +20.9% 5.355e+09 perf-stat.i.cache-references
7118377 -5.9% 6699288 perf-stat.i.context-switches
2.32 +10.4% 2.56 perf-stat.i.cpi
1759 ? 4% -63.3% 644.95 perf-stat.i.cycles-between-cache-misses
1.271e+11 -9.9% 1.145e+11 perf-stat.i.instructions
0.44 -9.1% 0.40 perf-stat.i.ipc
55.61 -6.0% 52.29 perf-stat.i.metric.K/sec
1.39 ? 3% +233.2% 4.64 perf-stat.overall.MPKI
1.03 +0.0 1.07 perf-stat.overall.branch-miss-rate%
3.99 ? 3% +5.9 9.91 perf-stat.overall.cache-miss-rate%
2.31 +10.3% 2.55 perf-stat.overall.cpi
1658 ? 3% -66.9% 548.75 perf-stat.overall.cycles-between-cache-misses
0.43 -9.4% 0.39 perf-stat.overall.ipc
2.337e+10 -10.0% 2.103e+10 perf-stat.ps.branch-instructions
2.41e+08 -6.2% 2.26e+08 perf-stat.ps.branch-misses
1.764e+08 ? 3% +199.8% 5.29e+08 perf-stat.ps.cache-misses
4.417e+09 +20.8% 5.336e+09 perf-stat.ps.cache-references
7096909 -6.0% 6672410 perf-stat.ps.context-switches
1.266e+11 -10.0% 1.14e+11 perf-stat.ps.instructions
3.879e+13 -10.0% 3.492e+13 perf-stat.total.instructions
67.36 -3.1 64.24 perf-profile.calltrace.cycles-pp.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto
67.58 -3.1 64.47 perf-profile.calltrace.cycles-pp.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner
71.02 -3.0 68.03 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream
64.63 -2.9 61.72 perf-profile.calltrace.cycles-pp.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
72.79 -2.9 69.94 perf-profile.calltrace.cycles-pp.send_omni_inner.send_udp_stream.main
72.82 -2.8 69.97 perf-profile.calltrace.cycles-pp.send_udp_stream.main
71.47 -2.8 68.64 perf-profile.calltrace.cycles-pp.sendto.send_omni_inner.send_udp_stream.main
70.66 -2.8 67.87 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner.send_udp_stream.main
45.90 -1.2 44.69 perf-profile.calltrace.cycles-pp.ip_make_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
15.20 -1.2 14.03 perf-profile.calltrace.cycles-pp.udp_send_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
14.76 -1.1 13.62 perf-profile.calltrace.cycles-pp.ip_send_skb.udp_send_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto
13.76 -1.1 12.70 perf-profile.calltrace.cycles-pp.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg.__sys_sendto
13.10 -1.0 12.09 perf-profile.calltrace.cycles-pp.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb.udp_sendmsg
10.87 -0.7 10.16 perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb
10.78 -0.7 10.07 perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.ip_send_skb
10.64 -0.7 9.94 perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2
34.84 -0.6 34.20 perf-profile.calltrace.cycles-pp.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg.__sys_sendto
4.83 ? 4% -0.6 4.19 perf-profile.calltrace.cycles-pp.__ip_make_skb.ip_make_skb.udp_sendmsg.__sys_sendto.__x64_sys_sendto
9.72 -0.6 9.10 perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
4.34 ? 4% -0.6 3.73 perf-profile.calltrace.cycles-pp.__ip_select_ident.__ip_make_skb.ip_make_skb.udp_sendmsg.__sys_sendto
9.41 -0.6 8.82 perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip
9.33 -0.6 8.74 perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq
33.97 -0.6 33.42 perf-profile.calltrace.cycles-pp._copy_from_iter.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
8.70 -0.5 8.17 perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq
7.36 -0.5 6.84 ? 2% perf-profile.calltrace.cycles-pp.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action
7.29 -0.5 6.78 ? 2% perf-profile.calltrace.cycles-pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__napi_poll
7.02 -0.5 6.54 ? 2% perf-profile.calltrace.cycles-pp.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog
5.91 -0.4 5.56 ? 2% perf-profile.calltrace.cycles-pp.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
5.81 -0.3 5.47 ? 2% perf-profile.calltrace.cycles-pp.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish
0.55 -0.3 0.25 ?100% perf-profile.calltrace.cycles-pp.irqtime_account_irq.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit
1.42 -0.2 1.22 perf-profile.calltrace.cycles-pp.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2.ip_send_skb.udp_send_skb
1.26 -0.2 1.08 perf-profile.calltrace.cycles-pp.loopback_xmit.dev_hard_start_xmit.__dev_queue_xmit.ip_finish_output2.ip_send_skb
1.49 -0.2 1.31 perf-profile.calltrace.cycles-pp.kfree_skb_reason.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv.ip_protocol_deliver_rcu
1.45 -0.2 1.27 perf-profile.calltrace.cycles-pp.skb_release_data.kfree_skb_reason.udp_queue_rcv_one_skb.udp_unicast_rcv_skb.__udp4_lib_rcv
1.94 ? 2% -0.2 1.77 perf-profile.calltrace.cycles-pp.ip_route_output_flow.udp_sendmsg.__sys_sendto.__x64_sys_sendto.do_syscall_64
1.54 -0.2 1.38 perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_sync_key
1.38 ? 3% -0.1 1.26 perf-profile.calltrace.cycles-pp.ip_route_output_key_hash_rcu.ip_route_output_flow.udp_sendmsg.__sys_sendto.__x64_sys_sendto
1.22 ? 3% -0.1 1.12 perf-profile.calltrace.cycles-pp.fib_table_lookup.ip_route_output_key_hash_rcu.ip_route_output_flow.udp_sendmsg.__sys_sendto
1.72 -0.1 1.63 perf-profile.calltrace.cycles-pp.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg.__sys_sendto
0.71 -0.1 0.62 perf-profile.calltrace.cycles-pp.__udp4_lib_lookup.__udp4_lib_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core
1.35 -0.1 1.26 perf-profile.calltrace.cycles-pp.alloc_skb_with_frags.sock_alloc_send_pskb.__ip_append_data.ip_make_skb.udp_sendmsg
0.66 -0.1 0.59 perf-profile.calltrace.cycles-pp.__check_object_size.ip_generic_getfrag.__ip_append_data.ip_make_skb.udp_sendmsg
0.79 -0.1 0.71 perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.kfree_skb_reason.udp_queue_rcv_one_skb.udp_unicast_rcv_skb
1.26 -0.1 1.19 perf-profile.calltrace.cycles-pp.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.__ip_append_data.ip_make_skb
0.62 -0.1 0.57 perf-profile.calltrace.cycles-pp.move_addr_to_kernel.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.14 -0.0 1.11 perf-profile.calltrace.cycles-pp.recvfrom
0.54 -0.0 0.52 perf-profile.calltrace.cycles-pp.sockfd_lookup_light.__sys_sendto.__x64_sys_sendto.do_syscall_64.entry_SYSCALL_64_after_hwframe
3.05 +0.1 3.13 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendto.send_omni_inner
0.91 ? 17% +0.2 1.07 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni
2.12 ? 2% +0.3 2.41 perf-profile.calltrace.cycles-pp.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg
2.17 ? 2% +0.3 2.46 perf-profile.calltrace.cycles-pp.sk_page_frag_refill.__ip_append_data.ip_make_skb.udp_sendmsg.__sys_sendto
1.65 ? 3% +0.3 1.95 perf-profile.calltrace.cycles-pp.alloc_pages_mpol.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data.ip_make_skb
0.59 +0.3 0.89 perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.skb_page_frag_refill
1.50 ? 3% +0.3 1.81 perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.skb_page_frag_refill.sk_page_frag_refill.__ip_append_data
1.19 ? 3% +0.3 1.52 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.skb_page_frag_refill.sk_page_frag_refill
0.09 ?223% +0.4 0.53 ? 4% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeout
0.00 +0.6 0.55 perf-profile.calltrace.cycles-pp.free_unref_page_commit.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg
0.82 ? 19% +0.6 1.40 perf-profile.calltrace.cycles-pp.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg
0.84 ? 19% +0.6 1.40 perf-profile.calltrace.cycles-pp.__consume_stateless_skb.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
0.39 ? 70% +0.6 1.01 perf-profile.calltrace.cycles-pp.free_unref_page.skb_release_data.__consume_stateless_skb.udp_recvmsg.inet_recvmsg
14.08 +2.3 16.40 perf-profile.calltrace.cycles-pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg
14.82 +2.4 17.18 perf-profile.calltrace.cycles-pp.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom
14.78 +2.4 17.16 perf-profile.calltrace.cycles-pp.__skb_datagram_iter.skb_copy_datagram_iter.udp_recvmsg.inet_recvmsg.sock_recvmsg
21.41 +2.8 24.22 perf-profile.calltrace.cycles-pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64
21.60 +2.8 24.42 perf-profile.calltrace.cycles-pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe
21.33 +2.8 24.14 perf-profile.calltrace.cycles-pp.udp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom
22.52 +2.8 25.36 perf-profile.calltrace.cycles-pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom
21.96 +2.9 24.81 perf-profile.calltrace.cycles-pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni
23.14 +2.9 26.02 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests.spawn_child
23.11 +2.9 25.99 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recvfrom.recv_omni.process_requests
23.45 +2.9 26.33 perf-profile.calltrace.cycles-pp.recvfrom.recv_omni.process_requests.spawn_child.accept_connection
23.94 +2.9 26.87 perf-profile.calltrace.cycles-pp.recv_omni.process_requests.spawn_child.accept_connection.accept_connections
21.17 ? 18% +5.7 26.88 perf-profile.calltrace.cycles-pp.accept_connection.accept_connections.main
21.17 ? 18% +5.7 26.88 perf-profile.calltrace.cycles-pp.accept_connections.main
21.17 ? 18% +5.7 26.88 perf-profile.calltrace.cycles-pp.process_requests.spawn_child.accept_connection.accept_connections.main
21.17 ? 18% +5.7 26.88 perf-profile.calltrace.cycles-pp.spawn_child.accept_connection.accept_connections.main
73.35 -3.0 70.38 perf-profile.children.cycles-pp.send_udp_stream
73.55 -2.9 70.64 perf-profile.children.cycles-pp.sendto
73.42 -2.9 70.53 perf-profile.children.cycles-pp.send_omni_inner
67.53 -2.9 64.66 perf-profile.children.cycles-pp.__sys_sendto
67.74 -2.9 64.88 perf-profile.children.cycles-pp.__x64_sys_sendto
64.82 -2.7 62.13 perf-profile.children.cycles-pp.udp_sendmsg
46.18 -1.2 44.97 perf-profile.children.cycles-pp.ip_make_skb
15.36 -1.2 14.17 perf-profile.children.cycles-pp.udp_send_skb
14.91 -1.2 13.76 perf-profile.children.cycles-pp.ip_send_skb
13.90 -1.1 12.82 perf-profile.children.cycles-pp.ip_finish_output2
13.25 -1.0 12.23 perf-profile.children.cycles-pp.__dev_queue_xmit
11.02 -0.7 10.29 perf-profile.children.cycles-pp.__local_bh_enable_ip
10.90 -0.7 10.18 perf-profile.children.cycles-pp.do_softirq
10.77 -0.7 10.08 perf-profile.children.cycles-pp.__do_softirq
35.12 -0.7 34.47 perf-profile.children.cycles-pp.ip_generic_getfrag
4.90 ? 4% -0.6 4.25 perf-profile.children.cycles-pp.__ip_make_skb
9.82 -0.6 9.19 perf-profile.children.cycles-pp.net_rx_action
4.39 ? 4% -0.6 3.77 perf-profile.children.cycles-pp.__ip_select_ident
9.51 -0.6 8.90 perf-profile.children.cycles-pp.__napi_poll
9.43 -0.6 8.84 perf-profile.children.cycles-pp.process_backlog
34.24 -0.6 33.68 perf-profile.children.cycles-pp._copy_from_iter
40.92 -0.5 40.38 perf-profile.children.cycles-pp.__ip_append_data
8.79 -0.5 8.26 perf-profile.children.cycles-pp.__netif_receive_skb_one_core
7.44 -0.5 6.91 ? 2% perf-profile.children.cycles-pp.ip_local_deliver_finish
7.37 -0.5 6.86 ? 2% perf-profile.children.cycles-pp.ip_protocol_deliver_rcu
7.11 -0.5 6.62 ? 2% perf-profile.children.cycles-pp.__udp4_lib_rcv
5.97 -0.4 5.61 ? 2% perf-profile.children.cycles-pp.udp_unicast_rcv_skb
5.91 -0.4 5.55 ? 2% perf-profile.children.cycles-pp.udp_queue_rcv_one_skb
1.45 -0.2 1.25 perf-profile.children.cycles-pp.dev_hard_start_xmit
1.32 -0.2 1.13 perf-profile.children.cycles-pp.loopback_xmit
1.52 -0.2 1.33 perf-profile.children.cycles-pp.kfree_skb_reason
1.97 ? 2% -0.2 1.80 perf-profile.children.cycles-pp.ip_route_output_flow
1.56 -0.2 1.40 perf-profile.children.cycles-pp.ttwu_do_activate
0.27 -0.1 0.15 perf-profile.children.cycles-pp.wakeup_preempt
1.40 ? 3% -0.1 1.28 perf-profile.children.cycles-pp.ip_route_output_key_hash_rcu
0.18 -0.1 0.06 ? 6% perf-profile.children.cycles-pp.check_preempt_wakeup_fair
1.24 ? 3% -0.1 1.14 perf-profile.children.cycles-pp.fib_table_lookup
0.74 -0.1 0.64 perf-profile.children.cycles-pp.__udp4_lib_lookup
1.75 -0.1 1.66 perf-profile.children.cycles-pp.sock_alloc_send_pskb
1.38 -0.1 1.29 perf-profile.children.cycles-pp.alloc_skb_with_frags
1.30 -0.1 1.22 perf-profile.children.cycles-pp.__alloc_skb
0.36 -0.1 0.29 perf-profile.children.cycles-pp.sock_wfree
0.53 ? 2% -0.1 0.46 perf-profile.children.cycles-pp.__netif_rx
1.22 -0.1 1.14 perf-profile.children.cycles-pp.__check_object_size
0.50 -0.1 0.43 perf-profile.children.cycles-pp.netif_rx_internal
0.47 ? 2% -0.1 0.40 perf-profile.children.cycles-pp.enqueue_to_backlog
0.51 -0.1 0.44 ? 2% perf-profile.children.cycles-pp.udp4_lib_lookup2
0.32 ? 2% -0.1 0.26 perf-profile.children.cycles-pp.pick_eevdf
0.65 -0.1 0.59 perf-profile.children.cycles-pp.move_addr_to_kernel
0.59 -0.1 0.53 perf-profile.children.cycles-pp.irqtime_account_irq
0.83 -0.1 0.78 perf-profile.children.cycles-pp.kmem_cache_alloc_node
0.56 -0.1 0.50 perf-profile.children.cycles-pp.sched_clock_cpu
0.35 -0.0 0.30 perf-profile.children.cycles-pp.validate_xmit_skb
0.48 -0.0 0.44 perf-profile.children.cycles-pp.sched_clock
0.40 -0.0 0.36 perf-profile.children.cycles-pp._raw_spin_trylock
0.40 -0.0 0.35 perf-profile.children.cycles-pp.reweight_entity
0.46 -0.0 0.42 perf-profile.children.cycles-pp._copy_from_user
0.94 -0.0 0.90 perf-profile.children.cycles-pp.update_load_avg
1.11 -0.0 1.07 perf-profile.children.cycles-pp.switch_mm_irqs_off
0.36 -0.0 0.32 perf-profile.children.cycles-pp._raw_spin_lock_irq
0.40 ? 2% -0.0 0.36 perf-profile.children.cycles-pp.kmalloc_reserve
0.43 -0.0 0.39 perf-profile.children.cycles-pp.native_sched_clock
0.65 -0.0 0.61 perf-profile.children.cycles-pp.kmem_cache_free
1.27 -0.0 1.23 ? 2% perf-profile.children.cycles-pp.activate_task
1.23 -0.0 1.20 ? 2% perf-profile.children.cycles-pp.enqueue_task_fair
0.48 -0.0 0.45 perf-profile.children.cycles-pp.__virt_addr_valid
0.78 -0.0 0.75 perf-profile.children.cycles-pp.check_heap_object
0.17 ? 2% -0.0 0.14 ? 2% perf-profile.children.cycles-pp.destroy_large_folio
0.38 -0.0 0.35 perf-profile.children.cycles-pp.entry_SYSCALL_64
0.33 -0.0 0.30 perf-profile.children.cycles-pp.__cond_resched
0.12 -0.0 0.09 ? 4% perf-profile.children.cycles-pp.__mem_cgroup_uncharge
0.36 -0.0 0.33 perf-profile.children.cycles-pp.__mkroute_output
0.30 -0.0 0.27 perf-profile.children.cycles-pp.ip_output
0.56 -0.0 0.53 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.24 -0.0 0.21 ? 2% perf-profile.children.cycles-pp.ip_setup_cork
0.18 -0.0 0.16 ? 3% perf-profile.children.cycles-pp.ipv4_pktinfo_prepare
0.27 -0.0 0.24 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
0.18 ? 2% -0.0 0.16 perf-profile.children.cycles-pp.netif_skb_features
0.27 -0.0 0.25 ? 2% perf-profile.children.cycles-pp.get_pfnblock_flags_mask
0.51 -0.0 0.49 perf-profile.children.cycles-pp.__netif_receive_skb_core
0.26 -0.0 0.24 ? 2% perf-profile.children.cycles-pp.__ip_local_out
0.17 ? 2% -0.0 0.15 ? 2% perf-profile.children.cycles-pp.dst_release
0.15 -0.0 0.13 perf-profile.children.cycles-pp.update_curr_se
0.12 -0.0 0.10 perf-profile.children.cycles-pp.rcu_all_qs
0.10 ? 5% -0.0 0.08 ? 6% perf-profile.children.cycles-pp.vruntime_eligible
0.13 ? 4% -0.0 0.11 ? 6% perf-profile.children.cycles-pp.security_sock_rcv_skb
0.29 ? 3% -0.0 0.27 perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.15 -0.0 0.13 ? 2% perf-profile.children.cycles-pp.ip_send_check
0.19 ? 2% -0.0 0.18 ? 2% perf-profile.children.cycles-pp.siphash_3u32
0.21 ? 3% -0.0 0.19 perf-profile.children.cycles-pp.sk_filter_trim_cap
0.14 ? 3% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.__folio_put
0.20 ? 2% -0.0 0.18 perf-profile.children.cycles-pp.udp4_csum_init
0.21 ? 2% -0.0 0.19 ? 2% perf-profile.children.cycles-pp.ipv4_mtu
0.25 ? 2% -0.0 0.23 perf-profile.children.cycles-pp.skb_set_owner_w
0.26 -0.0 0.24 perf-profile.children.cycles-pp.rseq_update_cpu_node_id
0.14 ? 2% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.__ip_finish_output
0.30 -0.0 0.29 perf-profile.children.cycles-pp.__update_load_avg_se
0.15 ? 2% -0.0 0.13 ? 3% perf-profile.children.cycles-pp.avg_vruntime
0.08 -0.0 0.07 ? 7% perf-profile.children.cycles-pp.skb_network_protocol
0.14 -0.0 0.13 ? 2% perf-profile.children.cycles-pp.check_stack_object
0.13 ? 2% -0.0 0.12 perf-profile.children.cycles-pp.xfrm_lookup_route
0.14 -0.0 0.13 perf-profile.children.cycles-pp.__put_user_8
0.09 -0.0 0.08 perf-profile.children.cycles-pp.nf_hook_slow
0.09 -0.0 0.08 perf-profile.children.cycles-pp.raw_v4_input
0.06 -0.0 0.05 perf-profile.children.cycles-pp.validate_xmit_xfrm
0.11 -0.0 0.10 perf-profile.children.cycles-pp.xfrm_lookup_with_ifid
0.11 +0.0 0.12 perf-profile.children.cycles-pp.security_socket_recvmsg
0.06 +0.0 0.08 ? 6% perf-profile.children.cycles-pp.demo_interval_tick
0.08 ? 4% +0.0 0.10 ? 4% perf-profile.children.cycles-pp.__build_skb_around
0.07 ? 5% +0.0 0.09 perf-profile.children.cycles-pp.should_failslab
0.06 +0.0 0.08 ? 4% perf-profile.children.cycles-pp.skb_clone_tx_timestamp
0.37 +0.0 0.39 perf-profile.children.cycles-pp.simple_copy_to_iter
0.06 ? 7% +0.0 0.09 ? 4% perf-profile.children.cycles-pp.task_work_run
0.06 ? 6% +0.0 0.09 perf-profile.children.cycles-pp.task_mm_cid_work
0.21 ? 2% +0.0 0.24 ? 5% perf-profile.children.cycles-pp.recv_data
0.69 +0.0 0.73 ? 2% perf-profile.children.cycles-pp.ip_rcv
0.25 +0.0 0.30 ? 2% perf-profile.children.cycles-pp.ip_rcv_core
0.06 ? 13% +0.1 0.11 ? 3% perf-profile.children.cycles-pp.__free_one_page
0.85 +0.1 0.91 perf-profile.children.cycles-pp.switch_fpu_return
0.71 +0.1 0.78 ? 2% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.28 ? 5% +0.1 0.36 ? 2% perf-profile.children.cycles-pp.update_process_times
0.31 ? 5% +0.1 0.40 ? 2% perf-profile.children.cycles-pp.tick_nohz_handler
4.78 +0.1 4.88 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
0.34 ? 4% +0.1 0.44 ? 2% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.43 ? 6% +0.1 0.56 ? 5% perf-profile.children.cycles-pp.hrtimer_interrupt
0.46 ? 6% +0.1 0.59 ? 5% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.49 ? 5% +0.2 0.64 ? 5% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.58 ? 5% +0.2 0.75 ? 4% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
2.21 ? 2% +0.3 2.50 perf-profile.children.cycles-pp.sk_page_frag_refill
2.41 +0.3 2.69 perf-profile.children.cycles-pp.skb_release_data
2.15 ? 2% +0.3 2.44 perf-profile.children.cycles-pp.skb_page_frag_refill
1.70 ? 2% +0.3 1.99 perf-profile.children.cycles-pp.alloc_pages_mpol
0.62 ? 2% +0.3 0.92 ? 2% perf-profile.children.cycles-pp.rmqueue
1.54 ? 2% +0.3 1.85 perf-profile.children.cycles-pp.__alloc_pages
1.23 ? 3% +0.3 1.56 perf-profile.children.cycles-pp.get_page_from_freelist
0.08 ? 12% +0.3 0.41 ? 3% perf-profile.children.cycles-pp.rmqueue_bulk
0.18 ? 4% +0.3 0.52 ? 2% perf-profile.children.cycles-pp.__rmqueue_pcplist
1.40 ? 2% +0.4 1.75 perf-profile.children.cycles-pp.free_unref_page
0.09 ? 11% +0.4 0.47 ? 2% perf-profile.children.cycles-pp.free_pcppages_bulk
0.27 ? 7% +0.4 0.65 perf-profile.children.cycles-pp.free_unref_page_commit
0.94 ? 2% +0.5 1.41 perf-profile.children.cycles-pp.__consume_stateless_skb
0.52 +0.6 1.10 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.06 ? 9% +0.6 0.66 ? 3% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
14.10 +2.3 16.43 perf-profile.children.cycles-pp._copy_to_iter
14.80 +2.4 17.17 perf-profile.children.cycles-pp.__skb_datagram_iter
14.82 +2.4 17.19 perf-profile.children.cycles-pp.skb_copy_datagram_iter
21.42 +2.8 24.24 perf-profile.children.cycles-pp.inet_recvmsg
21.61 +2.8 24.43 perf-profile.children.cycles-pp.sock_recvmsg
21.34 +2.8 24.16 perf-profile.children.cycles-pp.udp_recvmsg
22.59 +2.8 25.43 perf-profile.children.cycles-pp.__x64_sys_recvfrom
22.53 +2.8 25.37 perf-profile.children.cycles-pp.__sys_recvfrom
24.70 +2.9 27.55 perf-profile.children.cycles-pp.recvfrom
23.94 +2.9 26.88 perf-profile.children.cycles-pp.accept_connection
23.94 +2.9 26.88 perf-profile.children.cycles-pp.accept_connections
23.94 +2.9 26.88 perf-profile.children.cycles-pp.process_requests
23.94 +2.9 26.88 perf-profile.children.cycles-pp.spawn_child
23.94 +2.9 26.88 perf-profile.children.cycles-pp.recv_omni
34.03 -0.6 33.41 perf-profile.self.cycles-pp._copy_from_iter
4.17 ? 5% -0.6 3.56 ? 2% perf-profile.self.cycles-pp.__ip_select_ident
0.87 -0.1 0.77 perf-profile.self.cycles-pp.__sys_sendto
1.14 -0.1 1.04 perf-profile.self.cycles-pp.udp_sendmsg
0.92 ? 3% -0.1 0.84 ? 2% perf-profile.self.cycles-pp.fib_table_lookup
0.36 -0.1 0.28 perf-profile.self.cycles-pp.sock_wfree
1.80 -0.1 1.72 perf-profile.self.cycles-pp.__ip_append_data
0.31 -0.1 0.26 perf-profile.self.cycles-pp.loopback_xmit
0.63 -0.1 0.58 ? 2% perf-profile.self.cycles-pp.kmem_cache_alloc_node
0.30 -0.0 0.26 perf-profile.self.cycles-pp.udp4_lib_lookup2
0.46 -0.0 0.41 perf-profile.self.cycles-pp.do_syscall_64
0.58 -0.0 0.54 perf-profile.self.cycles-pp.ip_finish_output2
0.38 -0.0 0.34 perf-profile.self.cycles-pp._raw_spin_trylock
0.45 -0.0 0.41 perf-profile.self.cycles-pp._copy_from_user
0.37 -0.0 0.33 perf-profile.self.cycles-pp.udp_send_skb
0.34 -0.0 0.30 ? 2% perf-profile.self.cycles-pp.__alloc_skb
0.39 ? 3% -0.0 0.36 perf-profile.self.cycles-pp.__dev_queue_xmit
0.35 -0.0 0.31 perf-profile.self.cycles-pp._raw_spin_lock_irq
0.42 -0.0 0.38 perf-profile.self.cycles-pp.native_sched_clock
1.09 -0.0 1.06 perf-profile.self.cycles-pp.switch_mm_irqs_off
0.46 ? 2% -0.0 0.43 perf-profile.self.cycles-pp.__virt_addr_valid
0.32 -0.0 0.29 perf-profile.self.cycles-pp.__mkroute_output
0.23 ? 4% -0.0 0.20 ? 2% perf-profile.self.cycles-pp.pick_eevdf
0.23 ? 2% -0.0 0.20 ? 2% perf-profile.self.cycles-pp.__udp4_lib_lookup
0.46 -0.0 0.43 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.21 -0.0 0.18 ? 2% perf-profile.self.cycles-pp.ip_route_output_flow
0.06 -0.0 0.03 ? 70% perf-profile.self.cycles-pp.check_preempt_wakeup_fair
0.26 -0.0 0.23 ? 2% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
0.29 -0.0 0.26 perf-profile.self.cycles-pp.net_rx_action
0.11 ? 4% -0.0 0.09 ? 4% perf-profile.self.cycles-pp.__mem_cgroup_uncharge
0.29 ? 2% -0.0 0.26 perf-profile.self.cycles-pp.__check_object_size
0.50 -0.0 0.48 perf-profile.self.cycles-pp.__netif_receive_skb_core
0.28 -0.0 0.26 ? 2% perf-profile.self.cycles-pp.process_backlog
0.22 ? 2% -0.0 0.19 perf-profile.self.cycles-pp.ip_output
0.35 -0.0 0.33 perf-profile.self.cycles-pp.__ip_make_skb
0.38 -0.0 0.36 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.26 -0.0 0.24 ? 2% perf-profile.self.cycles-pp.get_pfnblock_flags_mask
0.47 -0.0 0.45 perf-profile.self.cycles-pp.kmem_cache_free
0.17 ? 2% -0.0 0.15 ? 3% perf-profile.self.cycles-pp.reweight_entity
0.22 -0.0 0.20 perf-profile.self.cycles-pp.__udp4_lib_rcv
0.14 -0.0 0.12 perf-profile.self.cycles-pp.validate_xmit_skb
0.15 -0.0 0.13 perf-profile.self.cycles-pp.dst_release
0.36 -0.0 0.34 perf-profile.self.cycles-pp.__do_softirq
0.10 ? 3% -0.0 0.08 perf-profile.self.cycles-pp.rcu_all_qs
0.08 -0.0 0.06 ? 6% perf-profile.self.cycles-pp.vruntime_eligible
0.24 ? 2% -0.0 0.23 ? 4% perf-profile.self.cycles-pp.__udp_enqueue_schedule_skb
0.16 ? 4% -0.0 0.14 ? 3% perf-profile.self.cycles-pp.enqueue_to_backlog
0.19 -0.0 0.17 perf-profile.self.cycles-pp.siphash_3u32
0.25 -0.0 0.24 perf-profile.self.cycles-pp.rseq_update_cpu_node_id
0.14 -0.0 0.12 ? 3% perf-profile.self.cycles-pp.ip_send_check
0.14 -0.0 0.12 ? 3% perf-profile.self.cycles-pp.ip_setup_cork
0.28 -0.0 0.26 perf-profile.self.cycles-pp.__update_load_avg_se
0.16 -0.0 0.14 ? 3% perf-profile.self.cycles-pp.ip_route_output_key_hash_rcu
0.19 -0.0 0.17 ? 2% perf-profile.self.cycles-pp.irqtime_account_irq
0.19 -0.0 0.17 ? 2% perf-profile.self.cycles-pp.udp4_csum_init
0.17 -0.0 0.16 ? 3% perf-profile.self.cycles-pp.ip_generic_getfrag
0.23 ? 2% -0.0 0.21 ? 3% perf-profile.self.cycles-pp.ip_send_skb
0.13 -0.0 0.12 ? 4% perf-profile.self.cycles-pp.update_curr_se
0.12 -0.0 0.10 ? 4% perf-profile.self.cycles-pp.__netif_receive_skb_one_core
0.08 ? 8% -0.0 0.06 ? 7% perf-profile.self.cycles-pp.__sk_mem_raise_allocated
0.11 ? 3% -0.0 0.09 ? 5% perf-profile.self.cycles-pp.security_sock_rcv_skb
0.06 ? 7% -0.0 0.05 perf-profile.self.cycles-pp.ip_local_deliver_finish
0.20 -0.0 0.19 ? 2% perf-profile.self.cycles-pp.__cond_resched
0.20 ? 3% -0.0 0.18 ? 2% perf-profile.self.cycles-pp.ipv4_mtu
0.30 -0.0 0.28 ? 2% perf-profile.self.cycles-pp.__alloc_pages
0.14 ? 2% -0.0 0.13 ? 2% perf-profile.self.cycles-pp.do_softirq
0.07 ? 5% -0.0 0.06 perf-profile.self.cycles-pp.skb_network_protocol
0.13 -0.0 0.12 perf-profile.self.cycles-pp.__wrgsbase_inactive
0.13 -0.0 0.12 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.14 -0.0 0.13 perf-profile.self.cycles-pp.ip_make_skb
0.13 -0.0 0.12 perf-profile.self.cycles-pp.move_addr_to_kernel
0.06 -0.0 0.05 perf-profile.self.cycles-pp.__ip_finish_output
0.14 ? 2% +0.0 0.15 ? 2% perf-profile.self.cycles-pp.enqueue_task_fair
0.12 ? 3% +0.0 0.14 ? 5% perf-profile.self.cycles-pp.recvfrom
0.08 ? 6% +0.0 0.09 perf-profile.self.cycles-pp.__build_skb_around
0.28 +0.0 0.30 perf-profile.self.cycles-pp.udp_recvmsg
0.06 ? 6% +0.0 0.08 perf-profile.self.cycles-pp.should_failslab
0.05 +0.0 0.07 ? 8% perf-profile.self.cycles-pp.demo_interval_tick
0.20 +0.0 0.22 ? 2% perf-profile.self.cycles-pp.__skb_datagram_iter
0.26 +0.0 0.28 perf-profile.self.cycles-pp.prepare_task_switch
0.06 +0.0 0.08 ? 5% perf-profile.self.cycles-pp.task_mm_cid_work
0.21 ? 3% +0.0 0.24 ? 4% perf-profile.self.cycles-pp.recv_omni
0.00 +0.1 0.05 perf-profile.self.cycles-pp.update_rq_clock
0.25 +0.1 0.30 ? 2% perf-profile.self.cycles-pp.ip_rcv_core
0.00 +0.1 0.06 ? 6% perf-profile.self.cycles-pp.skb_clone_tx_timestamp
0.04 ? 71% +0.1 0.10 perf-profile.self.cycles-pp.__free_one_page
0.01 ?223% +0.1 0.08 ? 6% perf-profile.self.cycles-pp.rmqueue_bulk
0.71 +0.1 0.78 perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.10 ? 3% +0.1 0.18 ? 3% perf-profile.self.cycles-pp.select_task_rq_fair
0.06 ? 9% +0.6 0.66 ? 3% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
14.03 +2.3 16.35 perf-profile.self.cycles-pp._copy_to_iter


***************************************************************************************************
lkp-icl-2sp8: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
gcc-13/performance/1HDD/ext4/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp8/fstat/stress-ng/60s

commit:
97450eb909 ("sched/pelt: Remove shift of thermal clock")
e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41
---------------- ---------------------------
%stddev %change %stddev
\ | \
5.96 ? 2% +22.6% 7.30 ? 3% iostat.cpu.user
6.10 ? 2% +1.4 7.50 ? 3% mpstat.cpu.all.usr%
728616 +22.2% 890173 vmstat.system.cs
1448236 ? 4% -81.4% 268871 ? 17% meminfo.Active
1441121 ? 4% -81.8% 261809 ? 18% meminfo.Active(anon)
6834186 +13.2% 7738594 meminfo.Inactive
6821947 +13.3% 7726478 meminfo.Inactive(anon)
152537 ? 31% -94.5% 8342 ? 61% numa-meminfo.node0.Active
150754 ? 32% -95.4% 6931 ? 78% numa-meminfo.node0.Active(anon)
1297193 ? 4% -79.9% 261111 ? 16% numa-meminfo.node1.Active
1291857 ? 4% -80.2% 255458 ? 16% numa-meminfo.node1.Active(anon)
37730 ? 32% -95.4% 1725 ? 77% numa-vmstat.node0.nr_active_anon
37730 ? 32% -95.4% 1725 ? 77% numa-vmstat.node0.nr_zone_active_anon
323570 ? 4% -80.5% 63052 ? 16% numa-vmstat.node1.nr_active_anon
323570 ? 4% -80.5% 63052 ? 16% numa-vmstat.node1.nr_zone_active_anon
4980068 -3.9% 4786411 stress-ng.fstat.ops
83000 -3.9% 79773 stress-ng.fstat.ops_per_sec
12565616 +88.8% 23722089 stress-ng.time.involuntary_context_switches
4457 +8.9% 4855 stress-ng.time.percent_of_cpu_this_job_got
2494 +7.3% 2678 stress-ng.time.system_time
183.10 ? 2% +30.7% 239.27 ? 2% stress-ng.time.user_time
7738050 -1.3% 7637067 stress-ng.time.voluntary_context_switches
1011067 +10.6% 1118668 sched_debug.cfs_rq:/.avg_vruntime.min
63124 ? 2% +24.2% 78376 ? 3% sched_debug.cfs_rq:/.avg_vruntime.stddev
1011067 +10.6% 1118668 sched_debug.cfs_rq:/.min_vruntime.min
63124 ? 2% +24.2% 78376 ? 3% sched_debug.cfs_rq:/.min_vruntime.stddev
779551 ? 5% -16.6% 649990 ? 14% sched_debug.cpu.curr->pid.avg
1406043 ? 2% -27.2% 1023836 sched_debug.cpu.curr->pid.max
695149 ? 3% -30.9% 480606 ? 6% sched_debug.cpu.curr->pid.stddev
356539 +22.0% 435017 sched_debug.cpu.nr_switches.avg
375244 +21.5% 455819 sched_debug.cpu.nr_switches.max
239983 +21.3% 290992 sched_debug.cpu.nr_switches.min
17526 ? 2% +25.0% 21908 ? 3% sched_debug.cpu.nr_switches.stddev
360657 ? 4% -81.8% 65651 ? 18% proc-vmstat.nr_active_anon
2656885 -2.5% 2589142 proc-vmstat.nr_file_pages
1706585 +13.3% 1933352 proc-vmstat.nr_inactive_anon
20023 +1.3% 20285 proc-vmstat.nr_kernel_stack
1863195 -3.6% 1795497 proc-vmstat.nr_shmem
360657 ? 4% -81.8% 65651 ? 18% proc-vmstat.nr_zone_active_anon
1706585 +13.3% 1933352 proc-vmstat.nr_zone_inactive_anon
56763914 -6.2% 53230553 proc-vmstat.numa_hit
56704746 -6.2% 53170292 proc-vmstat.numa_local
57428 ? 2% -50.0% 28742 ? 4% proc-vmstat.pgactivate
76049389 -7.0% 70719008 proc-vmstat.pgalloc_normal
73054423 -7.1% 67838944 proc-vmstat.pgfree
1.88 -18.2% 1.54 perf-stat.i.MPKI
2.208e+10 +17.1% 2.586e+10 perf-stat.i.branch-instructions
0.30 -0.0 0.27 perf-stat.i.branch-miss-rate%
61030073 +4.7% 63912456 perf-stat.i.branch-misses
25.57 -0.3 25.30 perf-stat.i.cache-miss-rate%
2.211e+08 -2.8% 2.149e+08 perf-stat.i.cache-misses
8.649e+08 -1.7% 8.499e+08 perf-stat.i.cache-references
765417 +21.4% 928892 perf-stat.i.context-switches
1.90 -16.2% 1.59 perf-stat.i.cpi
168415 -15.6% 142157 perf-stat.i.cpu-migrations
1008 +2.6% 1034 perf-stat.i.cycles-between-cache-misses
1.177e+11 +18.5% 1.394e+11 perf-stat.i.instructions
0.53 +18.7% 0.63 perf-stat.i.ipc
14.67 +14.4% 16.78 perf-stat.i.metric.K/sec
1.88 -18.0% 1.54 perf-stat.overall.MPKI
0.28 -0.0 0.25 perf-stat.overall.branch-miss-rate%
25.57 -0.3 25.30 perf-stat.overall.cache-miss-rate%
1.89 -15.8% 1.59 perf-stat.overall.cpi
1007 +2.6% 1033 perf-stat.overall.cycles-between-cache-misses
0.53 +18.8% 0.63 perf-stat.overall.ipc
724089 ? 6% +17.5% 850512 ? 7% perf-stat.ps.context-switches
159312 ? 6% -18.3% 130155 ? 7% perf-stat.ps.cpu-migrations
4.962e+12 ? 2% +21.1% 6.007e+12 ? 2% perf-stat.total.instructions
60.50 -11.7 48.80 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
60.48 -11.7 48.78 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
33.86 -7.5 26.35 perf-profile.calltrace.cycles-pp.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
33.85 -7.5 26.34 perf-profile.calltrace.cycles-pp.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.48 -7.4 22.06 perf-profile.calltrace.cycles-pp.exit_notify.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.34 -4.2 21.15 perf-profile.calltrace.cycles-pp.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
21.38 -4.2 17.18 perf-profile.calltrace.cycles-pp.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
25.29 -4.2 21.10 perf-profile.calltrace.cycles-pp.kernel_clone.__do_sys_clone3.do_syscall_64.entry_SYSCALL_64_after_hwframe
14.14 -4.1 10.07 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
13.70 -4.1 9.65 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.copy_process.kernel_clone.__do_sys_clone3
13.86 -3.7 10.11 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit.do_syscall_64
13.42 -3.7 9.68 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.__x64_sys_exit
15.23 -3.6 11.59 perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.__x64_sys_exit.do_syscall_64
13.21 -3.6 9.59 perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.__x64_sys_exit
12.73 -3.6 9.12 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
0.53 -0.3 0.25 ?100% perf-profile.calltrace.cycles-pp.remove_vm_area.vfree.delayed_vfree_work.process_one_work.worker_thread
1.20 -0.1 1.09 perf-profile.calltrace.cycles-pp.__schedule.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64
2.66 -0.1 2.54 perf-profile.calltrace.cycles-pp.ret_from_fork_asm
2.44 -0.1 2.32 perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
1.22 -0.1 1.10 perf-profile.calltrace.cycles-pp.do_task_dead.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
2.29 -0.1 2.19 perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
0.72 -0.1 0.66 perf-profile.calltrace.cycles-pp.__schedule.schedule.futex_wait_queue.__futex_wait.futex_wait
0.85 -0.1 0.78 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.84 -0.1 0.78 perf-profile.calltrace.cycles-pp.futex_wait.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.83 -0.1 0.76 perf-profile.calltrace.cycles-pp.__futex_wait.futex_wait.do_futex.__x64_sys_futex.do_syscall_64
0.74 -0.1 0.67 perf-profile.calltrace.cycles-pp.futex_wait_queue.__futex_wait.futex_wait.do_futex.__x64_sys_futex
0.73 -0.1 0.66 perf-profile.calltrace.cycles-pp.schedule.futex_wait_queue.__futex_wait.futex_wait.do_futex
0.85 -0.1 0.79 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.05 -0.1 0.99 perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1.50 -0.0 1.45 perf-profile.calltrace.cycles-pp.alloc_pid.copy_process.kernel_clone.__do_sys_clone3.do_syscall_64
1.17 -0.0 1.12 perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
1.24 -0.0 1.19 perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.85 -0.0 0.81 perf-profile.calltrace.cycles-pp.delayed_vfree_work.process_one_work.worker_thread.kthread.ret_from_fork
0.80 -0.0 0.76 perf-profile.calltrace.cycles-pp.vfree.delayed_vfree_work.process_one_work.worker_thread.kthread
0.94 -0.0 0.90 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.alloc_pid.copy_process.kernel_clone
1.16 -0.0 1.12 perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork
0.94 -0.0 0.90 perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1.12 -0.0 1.08 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.alloc_pid.copy_process.kernel_clone.__do_sys_clone3
1.15 -0.0 1.11 perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread
1.14 -0.0 1.10 perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn
0.90 -0.0 0.87 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
0.89 -0.0 0.86 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
0.88 -0.0 0.86 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
0.82 -0.0 0.79 perf-profile.calltrace.cycles-pp.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct.copy_process
0.89 -0.0 0.87 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
0.67 -0.0 0.65 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.66 -0.0 0.64 perf-profile.calltrace.cycles-pp.__alloc_pages_bulk.__vmalloc_area_node.__vmalloc_node_range.alloc_thread_stack_node.dup_task_struct
0.92 -0.0 0.89 perf-profile.calltrace.cycles-pp.__madvise
2.11 +0.0 2.13 perf-profile.calltrace.cycles-pp.update_sg_wakeup_stats.sched_balance_find_dst_group.sched_balance_find_dst_cpu.select_task_rq_fair.wake_up_new_task
0.61 +0.0 0.64 perf-profile.calltrace.cycles-pp.do_futex.mm_release.exit_mm.do_exit.__x64_sys_exit
0.60 +0.0 0.64 perf-profile.calltrace.cycles-pp.futex_wake.do_futex.mm_release.exit_mm.do_exit
0.64 +0.0 0.67 perf-profile.calltrace.cycles-pp.mm_release.exit_mm.do_exit.__x64_sys_exit.do_syscall_64
0.78 +0.0 0.82 perf-profile.calltrace.cycles-pp.exit_mm.do_exit.__x64_sys_exit.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.52 +0.2 0.70 perf-profile.calltrace.cycles-pp.kmem_cache_free.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
0.53 +0.2 0.72 perf-profile.calltrace.cycles-pp.cp_statx.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.53 +0.2 0.72 perf-profile.calltrace.cycles-pp.security_inode_getattr.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
0.55 +0.2 0.75 perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.strncpy_from_user.getname_flags.vfs_fstatat
0.51 +0.2 0.72 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.64 ? 3% +0.2 0.86 perf-profile.calltrace.cycles-pp.shim_statx
0.64 +0.3 0.89 perf-profile.calltrace.cycles-pp.complete_walk.path_lookupat.filename_lookup.vfs_statx.do_statx
0.70 +0.3 0.95 perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.strncpy_from_user.getname_flags.__x64_sys_statx
0.72 +0.3 0.99 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
0.63 +0.3 0.90 perf-profile.calltrace.cycles-pp.dput.path_put.vfs_statx.vfs_fstatat.__do_sys_newfstatat
0.43 ? 44% +0.3 0.72 perf-profile.calltrace.cycles-pp.lockref_put_return.dput.path_put.vfs_statx.vfs_fstatat
0.67 +0.3 0.96 perf-profile.calltrace.cycles-pp.path_put.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
0.67 +0.3 0.96 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.42 ? 44% +0.3 0.71 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.83 +0.3 1.13 perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.vfs_statx.do_statx
0.75 +0.3 1.08 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
0.75 +0.3 1.08 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__sched_yield
0.93 +0.3 1.28 perf-profile.calltrace.cycles-pp.kmem_cache_alloc.getname_flags.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.04 +0.4 1.40 perf-profile.calltrace.cycles-pp.cp_new_stat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
0.87 +0.4 1.23 perf-profile.calltrace.cycles-pp.__sched_yield
1.09 +0.4 1.48 perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.vfs_fstatat.__do_sys_newfstatat
1.42 +0.5 1.89 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.statx
1.42 +0.5 1.90 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.fstatat64
1.38 +0.5 1.88 perf-profile.calltrace.cycles-pp.__check_object_size.strncpy_from_user.getname_flags.__x64_sys_statx.do_syscall_64
0.00 +0.5 0.54 perf-profile.calltrace.cycles-pp._copy_to_user.cp_new_stat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.00 +0.5 0.54 ? 2% perf-profile.calltrace.cycles-pp.lookup_fast.walk_component.path_lookupat.filename_lookup.vfs_statx
1.38 +0.5 1.93 perf-profile.calltrace.cycles-pp.complete_walk.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
0.00 +0.6 0.55 perf-profile.calltrace.cycles-pp.path_init.path_lookupat.filename_lookup.vfs_statx.do_statx
0.00 +0.6 0.59 perf-profile.calltrace.cycles-pp.common_perm_cond.security_inode_getattr.vfs_statx.vfs_fstatat.__do_sys_newfstatat
0.00 +0.6 0.60 ? 2% perf-profile.calltrace.cycles-pp.path_init.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
0.00 +0.6 0.65 ? 2% perf-profile.calltrace.cycles-pp.walk_component.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
1.84 +0.7 2.50 perf-profile.calltrace.cycles-pp.link_path_walk.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat
0.00 +0.7 0.67 ? 2% perf-profile.calltrace.cycles-pp.inode_permission.link_path_walk.path_lookupat.filename_lookup.vfs_statx
1.69 +0.7 2.37 perf-profile.calltrace.cycles-pp.__legitimize_path.try_to_unlazy.complete_walk.path_lookupat.filename_lookup
2.09 +0.7 2.81 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
1.90 +0.8 2.65 perf-profile.calltrace.cycles-pp.try_to_unlazy.complete_walk.path_lookupat.filename_lookup.vfs_statx
2.30 +0.8 3.14 perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.vfs_statx.do_statx.__x64_sys_statx
2.63 +0.9 3.54 perf-profile.calltrace.cycles-pp.strncpy_from_user.getname_flags.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.95 +1.0 1.91 perf-profile.calltrace.cycles-pp.lockref_get_not_dead.__legitimize_path.try_to_unlazy.complete_walk.path_lookupat
2.87 +1.0 3.92 perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64
3.50 +1.2 4.72 perf-profile.calltrace.cycles-pp.getname_flags.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.10 +1.4 5.54 perf-profile.calltrace.cycles-pp.getname_flags.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
4.03 +1.5 5.52 perf-profile.calltrace.cycles-pp.vfs_statx.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.67 +1.7 6.40 perf-profile.calltrace.cycles-pp.path_lookupat.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat
5.24 +1.9 7.15 perf-profile.calltrace.cycles-pp.do_statx.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
5.44 +2.0 7.44 perf-profile.calltrace.cycles-pp.filename_lookup.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64
7.67 +2.8 10.49 perf-profile.calltrace.cycles-pp.vfs_statx.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe
10.57 +3.8 14.36 perf-profile.calltrace.cycles-pp.__x64_sys_statx.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
11.30 +4.1 15.41 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.statx
11.56 +4.2 15.77 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.statx
12.09 +4.4 16.46 perf-profile.calltrace.cycles-pp.vfs_fstatat.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
13.87 +5.0 18.84 perf-profile.calltrace.cycles-pp.statx
13.97 +5.0 18.98 perf-profile.calltrace.cycles-pp.__do_sys_newfstatat.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
14.70 +5.3 20.04 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fstatat64
14.99 +5.5 20.46 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fstatat64
17.21 +6.2 23.42 perf-profile.calltrace.cycles-pp.fstatat64
43.74 -11.4 32.30 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
41.21 -11.4 29.78 perf-profile.children.cycles-pp.queued_write_lock_slowpath
33.86 -7.5 26.35 perf-profile.children.cycles-pp.__x64_sys_exit
33.86 -7.5 26.36 perf-profile.children.cycles-pp.do_exit
29.50 -7.4 22.07 perf-profile.children.cycles-pp.exit_notify
25.30 -4.2 21.10 perf-profile.children.cycles-pp.kernel_clone
25.34 -4.2 21.15 perf-profile.children.cycles-pp.__do_sys_clone3
21.40 -4.2 17.21 perf-profile.children.cycles-pp.copy_process
15.24 -3.6 11.60 perf-profile.children.cycles-pp.release_task
89.78 -2.0 87.81 perf-profile.children.cycles-pp.do_syscall_64
90.27 -1.8 88.48 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
0.73 -0.2 0.58 perf-profile.children.cycles-pp.sched_balance_newidle
0.64 -0.1 0.50 ? 2% perf-profile.children.cycles-pp.sched_balance_rq
1.22 -0.1 1.10 perf-profile.children.cycles-pp.do_task_dead
2.44 -0.1 2.32 perf-profile.children.cycles-pp.ret_from_fork
2.66 -0.1 2.54 perf-profile.children.cycles-pp.ret_from_fork_asm
1.70 -0.1 1.60 perf-profile.children.cycles-pp.__do_softirq
2.29 -0.1 2.19 perf-profile.children.cycles-pp.kthread
1.59 -0.1 1.49 perf-profile.children.cycles-pp.rcu_core
1.56 -0.1 1.47 perf-profile.children.cycles-pp.rcu_do_batch
0.91 -0.1 0.83 perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.74 -0.1 0.67 perf-profile.children.cycles-pp.futex_wait_queue
0.84 -0.1 0.78 perf-profile.children.cycles-pp.futex_wait
0.85 -0.1 0.79 perf-profile.children.cycles-pp.__x64_sys_futex
0.54 -0.1 0.48 perf-profile.children.cycles-pp.irq_exit_rcu
0.83 -0.1 0.77 perf-profile.children.cycles-pp.__futex_wait
1.05 -0.1 0.99 perf-profile.children.cycles-pp.worker_thread
0.21 -0.1 0.16 ? 2% perf-profile.children.cycles-pp.detach_tasks
0.87 -0.0 0.83 perf-profile.children.cycles-pp.activate_task
1.50 -0.0 1.46 perf-profile.children.cycles-pp.alloc_pid
1.17 -0.0 1.12 perf-profile.children.cycles-pp.run_ksoftirqd
1.24 -0.0 1.19 perf-profile.children.cycles-pp.smpboot_thread_fn
0.55 -0.0 0.51 ? 2% perf-profile.children.cycles-pp.perf_session__process_user_event
0.85 -0.0 0.81 perf-profile.children.cycles-pp.delayed_vfree_work
0.54 -0.0 0.50 ? 2% perf-profile.children.cycles-pp.perf_session__deliver_event
0.94 -0.0 0.90 perf-profile.children.cycles-pp.process_one_work
0.55 -0.0 0.51 ? 2% perf-profile.children.cycles-pp.__ordered_events__flush
0.80 -0.0 0.76 perf-profile.children.cycles-pp.vfree
0.19 ? 3% -0.0 0.15 perf-profile.children.cycles-pp.free_unref_page_commit
0.24 ? 2% -0.0 0.20 perf-profile.children.cycles-pp.free_unref_page
0.50 -0.0 0.46 ? 2% perf-profile.children.cycles-pp.read
0.49 -0.0 0.45 perf-profile.children.cycles-pp.ksys_read
0.89 -0.0 0.85 perf-profile.children.cycles-pp.enqueue_task_fair
0.48 -0.0 0.44 ? 2% perf-profile.children.cycles-pp.seq_read
0.48 -0.0 0.44 ? 2% perf-profile.children.cycles-pp.seq_read_iter
0.44 -0.0 0.41 ? 2% perf-profile.children.cycles-pp.proc_pid_status
0.17 ? 4% -0.0 0.14 ? 3% perf-profile.children.cycles-pp.free_pcppages_bulk
0.48 -0.0 0.44 ? 3% perf-profile.children.cycles-pp.machine__process_fork_event
0.48 -0.0 0.44 perf-profile.children.cycles-pp.vfs_read
0.95 -0.0 0.91 perf-profile.children.cycles-pp.dequeue_task_fair
1.46 -0.0 1.42 perf-profile.children.cycles-pp.do_futex
0.19 ? 2% -0.0 0.16 ? 2% perf-profile.children.cycles-pp.update_sd_lb_stats
0.47 -0.0 0.44 ? 2% perf-profile.children.cycles-pp.____machine__findnew_thread
0.45 -0.0 0.41 perf-profile.children.cycles-pp.proc_single_show
0.78 -0.0 0.74 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.20 -0.0 0.17 ? 2% perf-profile.children.cycles-pp.sched_balance_find_src_group
0.56 -0.0 0.53 perf-profile.children.cycles-pp.dequeue_entity
0.19 ? 4% -0.0 0.16 ? 4% perf-profile.children.cycles-pp.__put_partials
0.51 -0.0 0.48 perf-profile.children.cycles-pp.enqueue_entity
0.53 -0.0 0.50 perf-profile.children.cycles-pp.remove_vm_area
0.17 ? 2% -0.0 0.14 ? 2% perf-profile.children.cycles-pp.update_sg_lb_stats
0.92 -0.0 0.90 perf-profile.children.cycles-pp.__madvise
0.89 -0.0 0.86 perf-profile.children.cycles-pp.__x64_sys_madvise
0.82 -0.0 0.79 perf-profile.children.cycles-pp.__vmalloc_area_node
0.89 -0.0 0.86 perf-profile.children.cycles-pp.do_madvise
0.68 -0.0 0.65 perf-profile.children.cycles-pp.madvise_vma_behavior
0.28 -0.0 0.25 perf-profile.children.cycles-pp.__slab_free
0.66 -0.0 0.64 perf-profile.children.cycles-pp.__alloc_pages_bulk
0.09 -0.0 0.07 ? 5% perf-profile.children.cycles-pp.flush_tlb_mm_range
0.63 -0.0 0.61 perf-profile.children.cycles-pp.zap_page_range_single
0.52 -0.0 0.50 perf-profile.children.cycles-pp.perf_event_task_output
0.09 ? 4% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.on_each_cpu_cond_mask
0.09 ? 4% -0.0 0.07 ? 5% perf-profile.children.cycles-pp.smp_call_function_many_cond
0.11 ? 4% -0.0 0.09 ? 6% perf-profile.children.cycles-pp.tlb_finish_mmu
0.80 -0.0 0.78 perf-profile.children.cycles-pp.__exit_signal
0.09 -0.0 0.08 ? 6% perf-profile.children.cycles-pp.schedule_tail
0.57 -0.0 0.55 perf-profile.children.cycles-pp.perf_iterate_sb
0.14 ? 3% -0.0 0.13 perf-profile.children.cycles-pp.__task_pid_nr_ns
0.48 -0.0 0.46 perf-profile.children.cycles-pp.clear_page_erms
0.07 -0.0 0.06 perf-profile.children.cycles-pp.__free_pages
0.07 -0.0 0.06 perf-profile.children.cycles-pp.bitmap_string
0.05 +0.0 0.06 perf-profile.children.cycles-pp.mem_cgroup_handle_over_high
0.07 +0.0 0.08 perf-profile.children.cycles-pp.rseq_get_rseq_cs
0.06 +0.0 0.07 ? 5% perf-profile.children.cycles-pp.cpuacct_charge
0.06 +0.0 0.07 ? 5% perf-profile.children.cycles-pp.os_xsave
0.18 ? 2% +0.0 0.19 perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
0.05 +0.0 0.06 ? 7% perf-profile.children.cycles-pp.select_idle_sibling
0.07 ? 5% +0.0 0.08 ? 5% perf-profile.children.cycles-pp.shmem_is_huge
0.09 ? 5% +0.0 0.11 ? 3% perf-profile.children.cycles-pp.rseq_ip_fixup
0.11 ? 3% +0.0 0.13 ? 2% perf-profile.children.cycles-pp.select_task_rq
0.09 +0.0 0.11 ? 4% perf-profile.children.cycles-pp.__switch_to
0.10 ? 3% +0.0 0.12 perf-profile.children.cycles-pp.___perf_sw_event
0.05 +0.0 0.07 ? 5% perf-profile.children.cycles-pp.make_vfsgid
0.07 ? 5% +0.0 0.09 perf-profile.children.cycles-pp.__enqueue_entity
0.13 +0.0 0.15 ? 4% perf-profile.children.cycles-pp.stress_fstat_thread
0.21 +0.0 0.23 perf-profile.children.cycles-pp.set_next_entity
0.05 +0.0 0.07 perf-profile.children.cycles-pp.proc_pid_get_link
0.07 +0.0 0.09 perf-profile.children.cycles-pp.pick_eevdf
0.49 +0.0 0.51 perf-profile.children.cycles-pp.try_to_wake_up
0.41 +0.0 0.43 perf-profile.children.cycles-pp.wake_up_q
0.22 ? 2% +0.0 0.25 perf-profile.children.cycles-pp.switch_fpu_return
0.12 ? 4% +0.0 0.15 perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.18 ? 2% +0.0 0.21 ? 3% perf-profile.children.cycles-pp.tick_nohz_handler
0.06 ? 6% +0.0 0.09 ? 5% perf-profile.children.cycles-pp.__x64_sys_newfstatat
0.17 ? 2% +0.0 0.20 ? 2% perf-profile.children.cycles-pp.update_process_times
0.03 ? 70% +0.0 0.06 perf-profile.children.cycles-pp.statx@plt
0.19 ? 2% +0.0 0.22 ? 3% perf-profile.children.cycles-pp.__hrtimer_run_queues
0.33 +0.0 0.36 ? 3% perf-profile.children.cycles-pp.pick_link
0.09 +0.0 0.12 perf-profile.children.cycles-pp.should_failslab
0.05 +0.0 0.08 perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.07 +0.0 0.10 perf-profile.children.cycles-pp.mntput
0.60 +0.0 0.64 perf-profile.children.cycles-pp.futex_wake
0.24 ? 2% +0.0 0.27 ? 3% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.23 ? 2% +0.0 0.26 ? 3% perf-profile.children.cycles-pp.hrtimer_interrupt
0.10 ? 4% +0.0 0.13 ? 2% perf-profile.children.cycles-pp.legitimize_links
0.64 +0.0 0.67 perf-profile.children.cycles-pp.mm_release
0.16 +0.0 0.19 ? 2% perf-profile.children.cycles-pp.prepare_task_switch
0.07 +0.0 0.10 ? 4% perf-profile.children.cycles-pp.yield_task_fair
0.78 +0.0 0.82 perf-profile.children.cycles-pp.exit_mm
0.12 ? 3% +0.0 0.16 ? 2% perf-profile.children.cycles-pp.mntput_no_expire
0.15 ? 3% +0.0 0.19 perf-profile.children.cycles-pp.__get_user_1
0.18 +0.0 0.22 ? 2% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.13 ? 3% +0.0 0.18 ? 3% perf-profile.children.cycles-pp.vfs_fstat
0.20 +0.0 0.25 ? 2% perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
0.02 ?141% +0.0 0.06 ? 17% perf-profile.children.cycles-pp.try_to_unlazy_next
0.00 +0.1 0.05 perf-profile.children.cycles-pp.apparmor_inode_getattr
0.00 +0.1 0.05 perf-profile.children.cycles-pp.tid_fd_revalidate
0.16 ? 3% +0.1 0.21 perf-profile.children.cycles-pp.security_inode_permission
0.17 ? 2% +0.1 0.22 ? 2% perf-profile.children.cycles-pp.from_kgid_munged
0.15 +0.1 0.21 ? 2% perf-profile.children.cycles-pp.is_vmalloc_addr
0.15 ? 2% +0.1 0.21 perf-profile.children.cycles-pp.amd_clear_divider
0.16 +0.1 0.22 perf-profile.children.cycles-pp.from_kuid_munged
0.67 +0.1 0.73 perf-profile.children.cycles-pp.update_curr
0.11 ? 4% +0.1 0.17 ? 2% perf-profile.children.cycles-pp.put_prev_entity
0.24 ? 4% +0.1 0.31 ? 5% perf-profile.children.cycles-pp.__fdget_raw
0.23 ? 2% +0.1 0.30 perf-profile.children.cycles-pp.rcu_all_qs
0.19 ? 16% +0.1 0.26 ? 10% perf-profile.children.cycles-pp.__lookup_mnt
0.16 ? 2% +0.1 0.23 perf-profile.children.cycles-pp.do_sched_yield
0.26 +0.1 0.34 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.23 ? 2% +0.1 0.32 ? 5% perf-profile.children.cycles-pp.terminate_walk
0.24 ? 2% +0.1 0.33 perf-profile.children.cycles-pp.make_vfsuid
0.28 +0.1 0.38 perf-profile.children.cycles-pp.map_id_up
1.42 +0.1 1.52 ? 2% perf-profile.children.cycles-pp._raw_spin_lock
0.28 +0.1 0.39 perf-profile.children.cycles-pp.check_stack_object
0.32 +0.1 0.44 perf-profile.children.cycles-pp.generic_fillattr
0.34 +0.1 0.46 ? 2% perf-profile.children.cycles-pp.__legitimize_mnt
0.31 ? 4% +0.1 0.46 ? 6% perf-profile.children.cycles-pp.set_root
0.44 +0.1 0.59 perf-profile.children.cycles-pp.vfs_getattr_nosec
0.49 +0.2 0.64 perf-profile.children.cycles-pp.syscall_return_via_sysret
0.53 +0.2 0.70 ? 2% perf-profile.children.cycles-pp.generic_permission
0.58 +0.2 0.76 perf-profile.children.cycles-pp.shmem_getattr
0.55 +0.2 0.74 perf-profile.children.cycles-pp.cp_statx
0.45 ? 2% +0.2 0.64 ? 4% perf-profile.children.cycles-pp.nd_jump_root
0.62 +0.2 0.82 perf-profile.children.cycles-pp.__check_heap_object
0.69 +0.2 0.91 perf-profile.children.cycles-pp._copy_to_user
0.66 ? 3% +0.2 0.88 perf-profile.children.cycles-pp.shim_statx
0.60 +0.2 0.84 perf-profile.children.cycles-pp.__cond_resched
0.45 ? 34% +0.2 0.70 ? 25% perf-profile.children.cycles-pp.stress_fstat_helper
0.67 +0.2 0.91 perf-profile.children.cycles-pp.common_perm_cond
0.66 ? 2% +0.2 0.90 ? 2% perf-profile.children.cycles-pp.__virt_addr_valid
0.66 ? 4% +0.2 0.91 ? 4% perf-profile.children.cycles-pp.__d_lookup_rcu
0.77 +0.3 1.04 perf-profile.children.cycles-pp.inode_permission
2.72 +0.3 3.01 perf-profile.children.cycles-pp.__schedule
0.82 +0.3 1.11 perf-profile.children.cycles-pp.putname
0.67 +0.3 0.96 perf-profile.children.cycles-pp.__x64_sys_sched_yield
1.06 ? 2% +0.3 1.36 ? 2% perf-profile.children.cycles-pp.step_into
0.81 +0.3 1.11 perf-profile.children.cycles-pp.security_inode_getattr
1.94 +0.3 2.27 perf-profile.children.cycles-pp.kmem_cache_free
0.88 +0.3 1.21 ? 2% perf-profile.children.cycles-pp.path_init
0.82 +0.3 1.16 ? 2% perf-profile.children.cycles-pp.lockref_put_return
1.44 +0.3 1.78 perf-profile.children.cycles-pp.schedule
1.00 ? 2% +0.3 1.34 ? 2% perf-profile.children.cycles-pp.lookup_fast
0.88 +0.4 1.24 perf-profile.children.cycles-pp.__sched_yield
1.08 +0.4 1.45 perf-profile.children.cycles-pp.cp_new_stat
1.19 ? 2% +0.4 1.60 ? 2% perf-profile.children.cycles-pp.walk_component
1.07 +0.4 1.50 perf-profile.children.cycles-pp.path_put
1.51 +0.5 1.96 perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
1.13 +0.5 1.58 perf-profile.children.cycles-pp.dput
1.37 +0.5 1.86 perf-profile.children.cycles-pp.check_heap_object
1.27 +0.5 1.79 perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.43 +0.6 2.00 perf-profile.children.cycles-pp.lockref_get_not_dead
1.87 +0.6 2.46 perf-profile.children.cycles-pp.entry_SYSCALL_64
1.84 +0.7 2.50 perf-profile.children.cycles-pp.kmem_cache_alloc
1.80 +0.7 2.52 perf-profile.children.cycles-pp.__legitimize_path
1.96 +0.8 2.73 perf-profile.children.cycles-pp.try_to_unlazy
2.04 +0.8 2.85 perf-profile.children.cycles-pp.complete_walk
2.84 +1.0 3.81 perf-profile.children.cycles-pp.link_path_walk
2.78 +1.0 3.75 perf-profile.children.cycles-pp.__check_object_size
4.85 +1.7 6.52 perf-profile.children.cycles-pp.strncpy_from_user
5.28 +1.9 7.21 perf-profile.children.cycles-pp.do_statx
7.37 +2.6 9.97 perf-profile.children.cycles-pp.path_lookupat
8.01 +2.8 10.82 perf-profile.children.cycles-pp.getname_flags
8.65 +3.0 11.70 perf-profile.children.cycles-pp.filename_lookup
10.65 +3.8 14.48 perf-profile.children.cycles-pp.__x64_sys_statx
12.16 +4.3 16.49 perf-profile.children.cycles-pp.vfs_statx
12.54 +4.4 16.91 perf-profile.children.cycles-pp.vfs_fstatat
14.41 +5.0 19.42 perf-profile.children.cycles-pp.__do_sys_newfstatat
13.92 +5.0 18.92 perf-profile.children.cycles-pp.statx
17.67 +6.2 23.87 perf-profile.children.cycles-pp.fstatat64
43.74 -11.4 32.29 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
1.36 -0.0 1.33 perf-profile.self.cycles-pp.queued_write_lock_slowpath
0.14 ? 3% -0.0 0.12 ? 3% perf-profile.self.cycles-pp.update_sg_lb_stats
0.07 ? 5% -0.0 0.05 ? 8% perf-profile.self.cycles-pp.smp_call_function_many_cond
0.60 -0.0 0.58 perf-profile.self.cycles-pp.__memcpy
0.47 -0.0 0.46 perf-profile.self.cycles-pp.clear_page_erms
0.07 -0.0 0.06 perf-profile.self.cycles-pp.__free_pages
0.13 -0.0 0.12 perf-profile.self.cycles-pp.sched_balance_find_dst_cpu
0.07 +0.0 0.08 perf-profile.self.cycles-pp.available_idle_cpu
0.06 +0.0 0.07 perf-profile.self.cycles-pp.__dequeue_entity
0.09 +0.0 0.10 perf-profile.self.cycles-pp.__switch_to
0.06 +0.0 0.07 perf-profile.self.cycles-pp.shmem_is_huge
0.06 +0.0 0.07 ? 5% perf-profile.self.cycles-pp.os_xsave
0.18 ? 2% +0.0 0.19 perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.10 ? 4% +0.0 0.11 perf-profile.self.cycles-pp.___perf_sw_event
0.12 +0.0 0.14 ? 3% perf-profile.self.cycles-pp.stress_fstat_thread
0.06 ? 6% +0.0 0.08 perf-profile.self.cycles-pp.pick_eevdf
0.05 +0.0 0.07 perf-profile.self.cycles-pp.mntput
0.05 +0.0 0.07 perf-profile.self.cycles-pp.should_failslab
0.07 +0.0 0.09 perf-profile.self.cycles-pp.__enqueue_entity
0.07 +0.0 0.09 ? 5% perf-profile.self.cycles-pp.legitimize_links
0.08 ? 4% +0.0 0.11 perf-profile.self.cycles-pp.amd_clear_divider
0.09 +0.0 0.12 perf-profile.self.cycles-pp.__legitimize_path
0.08 +0.0 0.11 perf-profile.self.cycles-pp.pick_next_task_fair
0.10 +0.0 0.13 ? 2% perf-profile.self.cycles-pp.complete_walk
0.11 ? 4% +0.0 0.14 perf-profile.self.cycles-pp.dput
0.11 ? 4% +0.0 0.14 ? 2% perf-profile.self.cycles-pp.security_inode_getattr
0.11 +0.0 0.15 perf-profile.self.cycles-pp.mntput_no_expire
0.12 ? 3% +0.0 0.16 ? 2% perf-profile.self.cycles-pp.try_to_unlazy
0.86 +0.0 0.90 perf-profile.self.cycles-pp._raw_spin_lock
0.11 +0.0 0.15 ? 2% perf-profile.self.cycles-pp.is_vmalloc_addr
0.18 ? 2% +0.0 0.22 perf-profile.self.cycles-pp.switch_mm_irqs_off
0.14 ? 3% +0.0 0.18 perf-profile.self.cycles-pp.__get_user_1
0.13 +0.0 0.17 ? 2% perf-profile.self.cycles-pp.security_inode_permission
0.13 +0.0 0.17 ? 2% perf-profile.self.cycles-pp.terminate_walk
0.32 +0.0 0.36 ? 2% perf-profile.self.cycles-pp.update_curr
0.14 +0.0 0.19 ? 2% perf-profile.self.cycles-pp.nd_jump_root
0.19 ? 2% +0.0 0.24 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
0.18 +0.0 0.23 perf-profile.self.cycles-pp.rcu_all_qs
0.00 +0.1 0.05 perf-profile.self.cycles-pp.from_kgid_munged
0.00 +0.1 0.05 perf-profile.self.cycles-pp.make_vfsgid
0.00 +0.1 0.05 perf-profile.self.cycles-pp.path_put
0.00 +0.1 0.06 ? 9% perf-profile.self.cycles-pp.__x64_sys_newfstatat
0.18 ? 2% +0.1 0.24 ? 2% perf-profile.self.cycles-pp.walk_component
0.22 +0.1 0.28 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare
0.16 ? 16% +0.1 0.23 ? 12% perf-profile.self.cycles-pp.__lookup_mnt
0.23 ? 4% +0.1 0.29 ? 5% perf-profile.self.cycles-pp.__fdget_raw
0.20 ? 2% +0.1 0.27 perf-profile.self.cycles-pp.lookup_fast
0.20 ? 2% +0.1 0.27 perf-profile.self.cycles-pp.shmem_getattr
0.18 +0.1 0.25 perf-profile.self.cycles-pp.make_vfsuid
0.21 +0.1 0.28 perf-profile.self.cycles-pp.vfs_fstatat
0.22 +0.1 0.30 perf-profile.self.cycles-pp.cp_statx
0.24 ? 2% +0.1 0.32 perf-profile.self.cycles-pp.generic_fillattr
0.23 +0.1 0.31 perf-profile.self.cycles-pp.check_stack_object
0.24 ? 3% +0.1 0.33 perf-profile.self.cycles-pp.map_id_up
0.24 +0.1 0.33 perf-profile.self.cycles-pp.__x64_sys_statx
0.31 ? 2% +0.1 0.40 perf-profile.self.cycles-pp.__cond_resched
0.23 ? 2% +0.1 0.32 ? 3% perf-profile.self.cycles-pp.inode_permission
0.30 ? 2% +0.1 0.40 perf-profile.self.cycles-pp.path_init
0.34 +0.1 0.44 perf-profile.self.cycles-pp.__schedule
0.32 +0.1 0.42 perf-profile.self.cycles-pp.path_lookupat
0.31 +0.1 0.42 ? 2% perf-profile.self.cycles-pp.__legitimize_mnt
0.30 ? 3% +0.1 0.43 ? 6% perf-profile.self.cycles-pp.set_root
0.42 +0.1 0.56 perf-profile.self.cycles-pp.syscall_exit_to_user_mode
0.42 +0.1 0.56 ? 3% perf-profile.self.cycles-pp.generic_permission
0.41 +0.1 0.56 perf-profile.self.cycles-pp.vfs_getattr_nosec
0.48 +0.1 0.63 perf-profile.self.cycles-pp.entry_SYSCALL_64
0.42 +0.2 0.57 perf-profile.self.cycles-pp.cp_new_stat
0.48 +0.2 0.64 perf-profile.self.cycles-pp.syscall_return_via_sysret
0.59 +0.2 0.77 perf-profile.self.cycles-pp.__check_heap_object
0.54 +0.2 0.72 perf-profile.self.cycles-pp.vfs_statx
0.54 +0.2 0.73 ? 2% perf-profile.self.cycles-pp.step_into
0.52 ? 8% +0.2 0.72 ? 5% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.58 +0.2 0.77 perf-profile.self.cycles-pp.check_heap_object
0.56 +0.2 0.77 perf-profile.self.cycles-pp.__check_object_size
0.67 +0.2 0.88 perf-profile.self.cycles-pp._copy_to_user
0.62 ? 3% +0.2 0.83 perf-profile.self.cycles-pp.shim_statx
0.62 +0.2 0.84 perf-profile.self.cycles-pp.common_perm_cond
0.39 ? 38% +0.2 0.62 ? 27% perf-profile.self.cycles-pp.stress_fstat_helper
0.61 ? 4% +0.2 0.84 ? 4% perf-profile.self.cycles-pp.__d_lookup_rcu
0.60 ? 2% +0.2 0.84 ? 2% perf-profile.self.cycles-pp.__virt_addr_valid
0.65 +0.2 0.88 perf-profile.self.cycles-pp.do_statx
0.67 +0.2 0.90 perf-profile.self.cycles-pp.__do_sys_newfstatat
0.78 +0.3 1.04 perf-profile.self.cycles-pp.do_syscall_64
0.76 +0.3 1.02 perf-profile.self.cycles-pp.putname
0.77 +0.3 1.05 perf-profile.self.cycles-pp.getname_flags
0.84 +0.3 1.12 ? 3% perf-profile.self.cycles-pp.link_path_walk
0.87 +0.3 1.16 perf-profile.self.cycles-pp.fstatat64
0.94 +0.3 1.26 perf-profile.self.cycles-pp.statx
0.80 +0.3 1.12 ? 2% perf-profile.self.cycles-pp.lockref_put_return
1.21 +0.4 1.60 perf-profile.self.cycles-pp.kmem_cache_free
1.46 +0.4 1.90 perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
1.26 +0.4 1.70 perf-profile.self.cycles-pp.filename_lookup
1.39 +0.5 1.89 perf-profile.self.cycles-pp.kmem_cache_alloc
1.39 +0.6 1.96 perf-profile.self.cycles-pp.lockref_get_not_dead
2.12 +0.7 2.83 perf-profile.self.cycles-pp.strncpy_from_user



***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
gcc-13/performance/4BRD_12G/xfs/x86_64-rhel-8.3/300/RAID1/debian-12-x86_64-20240206.cgz/lkp-csl-2sp3/sync_disk_rw/aim7

commit:
97450eb909 ("sched/pelt: Remove shift of thermal clock")
e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41
---------------- ---------------------------
%stddev %change %stddev
\ | \
24356729 ? 2% +15.8% 28212175 cpuidle..usage
30.69 ? 2% +13.6% 34.87 ? 2% iostat.cpu.idle
68.53 -6.3% 64.25 iostat.cpu.system
753687 +10.4% 831869 ? 4% meminfo.Inactive(anon)
164050 ? 6% +45.9% 239358 ? 11% meminfo.Mapped
6182 +9.6% 6775 ? 2% perf-c2c.DRAM.remote
3724 ? 3% +11.5% 4151 ? 2% perf-c2c.HITM.remote
648354 +7.8% 698666 ? 2% vmstat.io.bo
95.98 ? 6% -12.9% 83.64 ? 4% vmstat.procs.r
801317 +16.2% 930980 ? 2% vmstat.system.cs
29.52 ? 2% +4.2 33.67 ? 3% mpstat.cpu.all.idle%
0.04 ? 5% -0.0 0.04 ? 5% mpstat.cpu.all.iowait%
0.74 +0.1 0.85 ? 3% mpstat.cpu.all.usr%
85.46 -10.7% 76.32 mpstat.max_utilization_pct
16199 +9.6% 17753 aim7.jobs-per-min
111.17 -8.7% 101.44 aim7.time.elapsed_time
111.17 -8.7% 101.44 aim7.time.elapsed_time.max
4379281 +75.1% 7669146 aim7.time.involuntary_context_switches
6400 -6.5% 5983 aim7.time.percent_of_cpu_this_job_got
7089 -14.8% 6042 aim7.time.system_time
52793765 -4.9% 50219682 aim7.time.voluntary_context_switches
336033 +3.7% 348553 ? 2% proc-vmstat.nr_active_anon
1179748 +2.6% 1210206 proc-vmstat.nr_file_pages
188573 +10.4% 208211 ? 4% proc-vmstat.nr_inactive_anon
41560 ? 6% +45.9% 60654 ? 11% proc-vmstat.nr_mapped
360884 +8.6% 391938 proc-vmstat.nr_shmem
336033 +3.7% 348553 ? 2% proc-vmstat.nr_zone_active_anon
188573 +10.4% 208211 ? 4% proc-vmstat.nr_zone_inactive_anon
1474174 -21.8% 1153102 sched_debug.cfs_rq:/.avg_vruntime.avg
1644121 -20.9% 1300217 ? 3% sched_debug.cfs_rq:/.avg_vruntime.max
1394133 -21.2% 1098002 ? 2% sched_debug.cfs_rq:/.avg_vruntime.min
38297 ? 7% -36.5% 24315 ? 12% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.78 ? 18% -27.4% 0.57 ? 7% sched_debug.cfs_rq:/.h_nr_running.avg
1474174 -21.8% 1153102 sched_debug.cfs_rq:/.min_vruntime.avg
1644121 -20.9% 1300217 ? 3% sched_debug.cfs_rq:/.min_vruntime.max
1394133 -21.2% 1098002 ? 2% sched_debug.cfs_rq:/.min_vruntime.min
38297 ? 7% -36.5% 24315 ? 12% sched_debug.cfs_rq:/.min_vruntime.stddev
0.47 ? 2% -9.8% 0.42 ? 4% sched_debug.cfs_rq:/.nr_running.avg
853.31 ? 10% -26.5% 627.20 sched_debug.cfs_rq:/.runnable_avg.avg
3612 ? 27% -45.1% 1983 ? 17% sched_debug.cfs_rq:/.runnable_avg.max
588.74 ? 21% -37.5% 367.74 ? 5% sched_debug.cfs_rq:/.runnable_avg.stddev
2305 ? 22% -30.1% 1612 ? 16% sched_debug.cfs_rq:/.util_avg.max
394.30 ? 16% -21.5% 309.35 ? 5% sched_debug.cfs_rq:/.util_avg.stddev
272.57 ? 17% -41.5% 159.47 ? 7% sched_debug.cfs_rq:/.util_est.avg
377.83 ? 35% -43.3% 214.15 ? 16% sched_debug.cfs_rq:/.util_est.stddev
0.79 ? 18% -30.8% 0.55 ? 7% sched_debug.cpu.nr_running.avg
221477 +20.1% 265968 sched_debug.cpu.nr_switches.avg
238906 ? 3% +18.7% 283476 ? 2% sched_debug.cpu.nr_switches.max
6940 ? 8% +31.5% 9124 ? 9% sched_debug.cpu.nr_switches.stddev
1.47 ? 2% +10.1% 1.62 ? 3% perf-stat.i.MPKI
1.21 +0.2 1.36 ? 3% perf-stat.i.branch-miss-rate%
86242084 ? 4% +16.3% 1.003e+08 ? 4% perf-stat.i.branch-misses
80906704 ? 2% +12.5% 90980799 ? 2% perf-stat.i.cache-misses
817901 +16.5% 952574 ? 2% perf-stat.i.context-switches
3.85 -7.4% 3.57 perf-stat.i.cpi
2.134e+11 -5.3% 2.02e+11 perf-stat.i.cpu-cycles
2671 ? 2% -14.9% 2272 ? 3% perf-stat.i.cycles-between-cache-misses
0.32 +7.6% 0.34 perf-stat.i.ipc
10.77 +13.5% 12.23 ? 2% perf-stat.i.metric.K/sec
6523 +8.0% 7047 ? 3% perf-stat.i.minor-faults
6524 +8.0% 7048 ? 3% perf-stat.i.page-faults
1.54 ? 2% +10.5% 1.70 ? 2% perf-stat.overall.MPKI
0.80 ? 3% +0.1 0.92 ? 4% perf-stat.overall.branch-miss-rate%
4.06 -7.0% 3.78 perf-stat.overall.cpi
2640 ? 2% -15.8% 2222 ? 2% perf-stat.overall.cycles-between-cache-misses
0.25 +7.6% 0.26 perf-stat.overall.ipc
85266080 ? 4% +16.3% 99200665 ? 4% perf-stat.ps.branch-misses
80071004 ? 2% +12.5% 90070323 ? 2% perf-stat.ps.cache-misses
809411 +16.5% 943274 ? 2% perf-stat.ps.context-switches
2.113e+11 -5.3% 2e+11 perf-stat.ps.cpu-cycles
5.852e+12 -6.5% 5.47e+12 ? 2% perf-stat.total.instructions
12.13 ? 4% -4.2 7.88 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request
12.19 ? 4% -4.2 7.96 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.md_flush_request.raid1_make_request.md_handle_request.__submit_bio
13.44 ? 4% -4.2 9.25 ? 3% perf-profile.calltrace.cycles-pp.md_flush_request.raid1_make_request.md_handle_request.__submit_bio.__submit_bio_noacct
13.46 ? 4% -4.2 9.27 ? 3% perf-profile.calltrace.cycles-pp.md_handle_request.__submit_bio.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush
13.48 ? 4% -4.2 9.30 ? 3% perf-profile.calltrace.cycles-pp.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write
13.44 ? 4% -4.2 9.26 ? 3% perf-profile.calltrace.cycles-pp.raid1_make_request.md_handle_request.__submit_bio.__submit_bio_noacct.submit_bio_wait
13.48 ? 4% -4.2 9.30 ? 3% perf-profile.calltrace.cycles-pp.__submit_bio.__submit_bio_noacct.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync
13.57 ? 4% -4.2 9.41 ? 3% perf-profile.calltrace.cycles-pp.submit_bio_wait.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write.vfs_write
13.59 ? 4% -4.2 9.43 ? 3% perf-profile.calltrace.cycles-pp.blkdev_issue_flush.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write
84.77 -3.7 81.04 perf-profile.calltrace.cycles-pp.xfs_file_fsync.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
86.25 -3.7 82.54 perf-profile.calltrace.cycles-pp.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
86.38 -3.7 82.68 perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
86.40 -3.7 82.69 perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
86.57 -3.7 82.88 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
86.56 -3.7 82.88 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
86.80 -3.7 83.13 perf-profile.calltrace.cycles-pp.write
3.35 -1.3 2.10 ? 3% perf-profile.calltrace.cycles-pp.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write
3.06 -1.2 1.86 ? 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync
3.06 -1.2 1.86 ? 2% perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
3.05 -1.2 1.85 ? 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_wait_on_iclog.xfs_log_force_seq
3.86 ? 2% -1.1 2.78 ? 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.remove_wait_queue.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync
3.84 ? 2% -1.1 2.76 ? 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.remove_wait_queue.xlog_cil_force_seq.xfs_log_force_seq
3.86 ? 2% -1.1 2.78 ? 3% perf-profile.calltrace.cycles-pp.remove_wait_queue.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
0.84 ? 2% -0.0 0.80 perf-profile.calltrace.cycles-pp.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write.vfs_write.ksys_write
0.82 ? 2% -0.0 0.77 perf-profile.calltrace.cycles-pp.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write.vfs_write
0.58 ? 2% -0.0 0.55 ? 2% perf-profile.calltrace.cycles-pp.__xfs_trans_commit.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_buffered_write
0.51 +0.0 0.56 perf-profile.calltrace.cycles-pp.xfs_end_ioend.xfs_end_io.process_one_work.worker_thread.kthread
0.52 +0.0 0.57 perf-profile.calltrace.cycles-pp.xfs_end_io.process_one_work.worker_thread.kthread.ret_from_fork
0.58 ? 2% +0.1 0.62 ? 3% perf-profile.calltrace.cycles-pp.iomap_file_buffered_write.xfs_file_buffered_write.vfs_write.ksys_write.do_syscall_64
0.90 ? 2% +0.1 0.99 ? 3% perf-profile.calltrace.cycles-pp.copy_to_brd.brd_submit_bio.__submit_bio.__submit_bio_noacct.iomap_submit_ioend
2.58 ? 2% +0.1 2.68 ? 2% perf-profile.calltrace.cycles-pp.__submit_bio.__submit_bio_noacct.iomap_submit_ioend.xfs_vm_writepages.do_writepages
2.59 ? 2% +0.1 2.70 ? 2% perf-profile.calltrace.cycles-pp.__submit_bio_noacct.iomap_submit_ioend.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc
2.12 +0.1 2.22 perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2.62 ? 2% +0.1 2.73 ? 2% perf-profile.calltrace.cycles-pp.iomap_submit_ioend.xfs_vm_writepages.do_writepages.filemap_fdatawrite_wbc.__filemap_fdatawrite_range
1.17 ? 2% +0.1 1.29 ? 3% perf-profile.calltrace.cycles-pp.brd_submit_bio.__submit_bio.__submit_bio_noacct.iomap_submit_ioend.xfs_vm_writepages
2.24 +0.1 2.38 perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
2.24 +0.1 2.38 perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
2.24 +0.1 2.38 perf-profile.calltrace.cycles-pp.ret_from_fork_asm
2.23 +0.1 2.37 perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.91 ? 2% +0.7 1.58 perf-profile.calltrace.cycles-pp.mutex_spin_on_owner.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
54.72 +1.0 55.70 perf-profile.calltrace.cycles-pp.osq_lock.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq
1.71 ? 5% +1.1 2.86 ? 10% perf-profile.calltrace.cycles-pp.intel_idle_irq.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
64.02 +1.6 65.62 perf-profile.calltrace.cycles-pp.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write.vfs_write
7.56 ? 2% +1.9 9.42 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
55.82 +2.0 57.86 perf-profile.calltrace.cycles-pp.__mutex_lock.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq
58.14 +2.6 60.76 perf-profile.calltrace.cycles-pp.__flush_workqueue.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync
59.46 +2.7 62.18 perf-profile.calltrace.cycles-pp.xlog_cil_push_now.xlog_cil_force_seq.xfs_log_force_seq.xfs_file_fsync.xfs_file_buffered_write
9.68 ? 2% +3.2 12.86 ? 3% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
9.69 ? 2% +3.2 12.86 ? 3% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
9.86 ? 2% +3.2 13.08 ? 3% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
10.46 ? 2% +3.5 13.94 ? 2% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
10.47 ? 2% +3.5 13.94 ? 2% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
10.47 ? 2% +3.5 13.94 ? 2% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
10.60 ? 2% +3.5 14.08 ? 2% perf-profile.calltrace.cycles-pp.common_startup_64
22.14 ? 2% -6.5 15.63 ? 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
12.70 ? 3% -4.2 8.46 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_irq
13.54 ? 4% -4.2 9.34 ? 3% perf-profile.children.cycles-pp.md_flush_request
14.22 ? 3% -4.2 10.06 ? 3% perf-profile.children.cycles-pp.md_handle_request
14.20 ? 3% -4.2 10.04 ? 3% perf-profile.children.cycles-pp.raid1_make_request
13.57 ? 4% -4.2 9.41 ? 3% perf-profile.children.cycles-pp.submit_bio_wait
13.59 ? 4% -4.2 9.43 ? 3% perf-profile.children.cycles-pp.blkdev_issue_flush
16.32 ? 3% -4.1 12.26 ? 2% perf-profile.children.cycles-pp.__submit_bio
16.34 ? 3% -4.1 12.28 ? 2% perf-profile.children.cycles-pp.__submit_bio_noacct
84.77 -3.7 81.04 perf-profile.children.cycles-pp.xfs_file_fsync
86.25 -3.7 82.54 perf-profile.children.cycles-pp.xfs_file_buffered_write
86.40 -3.7 82.70 perf-profile.children.cycles-pp.vfs_write
86.41 -3.7 82.71 perf-profile.children.cycles-pp.ksys_write
86.71 -3.7 83.03 perf-profile.children.cycles-pp.do_syscall_64
86.71 -3.7 83.04 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
86.85 -3.7 83.18 perf-profile.children.cycles-pp.write
8.05 -2.3 5.71 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
7.14 -2.3 4.82 ? 2% perf-profile.children.cycles-pp.remove_wait_queue
3.48 -1.3 2.23 ? 3% perf-profile.children.cycles-pp.xlog_wait_on_iclog
0.28 ? 3% -0.1 0.21 ? 5% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.26 ? 3% -0.1 0.19 ? 4% perf-profile.children.cycles-pp.sysvec_call_function_single
0.24 ? 3% -0.1 0.18 ? 5% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.16 ? 14% -0.1 0.10 ? 7% perf-profile.children.cycles-pp.sb_clear_inode_writeback
0.15 ? 10% -0.1 0.10 ? 8% perf-profile.children.cycles-pp.sb_mark_inode_writeback
0.34 ? 6% -0.0 0.29 ? 4% perf-profile.children.cycles-pp.__folio_end_writeback
0.21 ? 8% -0.0 0.17 ? 4% perf-profile.children.cycles-pp.__folio_start_writeback
0.84 ? 2% -0.0 0.80 perf-profile.children.cycles-pp.kiocb_modified
0.82 ? 2% -0.0 0.78 perf-profile.children.cycles-pp.xfs_vn_update_time
0.33 ? 4% -0.0 0.30 ? 4% perf-profile.children.cycles-pp.iomap_writepage_map
0.43 ? 3% -0.0 0.40 ? 3% perf-profile.children.cycles-pp.iomap_writepages
0.10 ? 6% -0.0 0.08 ? 7% perf-profile.children.cycles-pp.xfs_log_ticket_ungrant
0.12 -0.0 0.11 ? 3% perf-profile.children.cycles-pp.xlog_state_clean_iclog
0.05 +0.0 0.06 perf-profile.children.cycles-pp.__update_blocked_fair
0.05 +0.0 0.06 perf-profile.children.cycles-pp.kmem_cache_free
0.11 +0.0 0.12 perf-profile.children.cycles-pp.llseek
0.07 +0.0 0.08 perf-profile.children.cycles-pp.switch_fpu_return
0.06 +0.0 0.07 perf-profile.children.cycles-pp.sched_clock
0.06 +0.0 0.07 perf-profile.children.cycles-pp.wake_page_function
0.10 +0.0 0.11 ? 4% perf-profile.children.cycles-pp.xfs_buffered_write_iomap_begin
0.10 ? 3% +0.0 0.12 ? 4% perf-profile.children.cycles-pp.__switch_to_asm
0.05 +0.0 0.06 ? 7% perf-profile.children.cycles-pp.ktime_get
0.05 ? 8% +0.0 0.07 ? 7% perf-profile.children.cycles-pp.xfs_btree_lookup_get_block
0.07 ? 7% +0.0 0.08 perf-profile.children.cycles-pp.__switch_to
0.05 ? 7% +0.0 0.06 ? 7% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.05 ? 7% +0.0 0.07 ? 7% perf-profile.children.cycles-pp.mutex_lock
0.07 ? 5% +0.0 0.09 ? 5% perf-profile.children.cycles-pp.select_idle_core
0.05 ? 8% +0.0 0.07 ? 5% perf-profile.children.cycles-pp.llist_add_batch
0.06 ? 8% +0.0 0.07 ? 5% perf-profile.children.cycles-pp.mutex_unlock
0.06 ? 6% +0.0 0.08 ? 6% perf-profile.children.cycles-pp.perf_tp_event
0.14 ? 3% +0.0 0.15 ? 2% perf-profile.children.cycles-pp.xlog_cil_committed
0.12 ? 3% +0.0 0.14 ? 4% perf-profile.children.cycles-pp.iomap_iter
0.08 ? 4% +0.0 0.10 ? 3% perf-profile.children.cycles-pp.xfs_btree_lookup
0.14 ? 3% +0.0 0.15 ? 3% perf-profile.children.cycles-pp.xlog_cil_process_committed
0.13 ? 5% +0.0 0.15 perf-profile.children.cycles-pp.xlog_cil_write_commit_record
0.07 ? 7% +0.0 0.08 ? 5% perf-profile.children.cycles-pp.submit_flushes
0.13 ? 4% +0.0 0.15 ? 2% perf-profile.children.cycles-pp.xlog_cil_set_ctx_write_state
0.09 ? 5% +0.0 0.11 ? 6% perf-profile.children.cycles-pp.kick_pool
0.14 ? 4% +0.0 0.16 ? 2% perf-profile.children.cycles-pp.xfs_bmap_add_extent_unwritten_real
0.11 ? 3% +0.0 0.13 ? 4% perf-profile.children.cycles-pp.__queue_work
0.07 ? 6% +0.0 0.10 ? 8% perf-profile.children.cycles-pp.__smp_call_single_queue
0.13 ? 3% +0.0 0.15 ? 2% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.14 ? 3% +0.0 0.17 ? 2% perf-profile.children.cycles-pp.prepare_task_switch
0.11 +0.0 0.13 ? 5% perf-profile.children.cycles-pp.sched_balance_update_blocked_averages
0.15 ? 3% +0.0 0.18 ? 4% perf-profile.children.cycles-pp.bio_alloc_bioset
0.18 ? 4% +0.0 0.20 ? 3% perf-profile.children.cycles-pp.xfs_bmapi_write
0.08 +0.0 0.10 ? 4% perf-profile.children.cycles-pp.__cond_resched
0.15 ? 5% +0.0 0.17 ? 2% perf-profile.children.cycles-pp.xfs_bmapi_convert_unwritten
0.28 ? 3% +0.0 0.30 perf-profile.children.cycles-pp.select_idle_sibling
0.18 ? 5% +0.0 0.20 ? 4% perf-profile.children.cycles-pp.available_idle_cpu
0.11 ? 4% +0.0 0.14 ? 4% perf-profile.children.cycles-pp.queue_work_on
0.12 +0.0 0.15 ? 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.58 +0.0 0.60 ? 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
0.22 ? 4% +0.0 0.25 ? 2% perf-profile.children.cycles-pp.select_idle_cpu
0.14 ? 3% +0.0 0.18 ? 3% perf-profile.children.cycles-pp.menu_select
0.36 ? 2% +0.0 0.41 perf-profile.children.cycles-pp.xfs_iomap_write_unwritten
0.51 +0.0 0.56 perf-profile.children.cycles-pp.xfs_end_ioend
0.52 +0.0 0.57 perf-profile.children.cycles-pp.xfs_end_io
0.58 ? 2% +0.1 0.63 ? 3% perf-profile.children.cycles-pp.iomap_file_buffered_write
0.48 +0.1 0.53 perf-profile.children.cycles-pp.sched_ttwu_pending
0.00 +0.1 0.06 ? 16% perf-profile.children.cycles-pp.poll_idle
0.57 +0.1 0.63 perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.24 ? 3% +0.1 0.32 ? 2% perf-profile.children.cycles-pp.flush_workqueue_prep_pwqs
0.20 ? 3% +0.1 0.28 ? 2% perf-profile.children.cycles-pp.schedule_idle
2.12 +0.1 2.22 perf-profile.children.cycles-pp.process_one_work
2.62 ? 2% +0.1 2.73 ? 2% perf-profile.children.cycles-pp.iomap_submit_ioend
1.04 ? 2% +0.1 1.15 ? 2% perf-profile.children.cycles-pp.copy_to_brd
2.24 +0.1 2.38 perf-profile.children.cycles-pp.kthread
2.24 +0.1 2.38 perf-profile.children.cycles-pp.ret_from_fork
2.24 +0.1 2.38 perf-profile.children.cycles-pp.ret_from_fork_asm
2.23 +0.1 2.37 perf-profile.children.cycles-pp.worker_thread
1.32 ? 2% +0.1 1.46 ? 2% perf-profile.children.cycles-pp.brd_submit_bio
0.36 ? 2% +0.1 0.50 ? 2% perf-profile.children.cycles-pp.flush_smp_call_function_queue
0.89 +0.2 1.05 ? 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
0.09 ? 4% +0.2 0.27 ? 2% perf-profile.children.cycles-pp.wake_up_q
1.25 +0.2 1.46 perf-profile.children.cycles-pp.try_to_wake_up
0.10 ? 8% +0.2 0.31 ? 18% perf-profile.children.cycles-pp.schedule_preempt_disabled
0.12 ? 3% +0.3 0.40 perf-profile.children.cycles-pp.__mutex_unlock_slowpath
0.91 ? 2% +0.7 1.58 perf-profile.children.cycles-pp.mutex_spin_on_owner
54.74 +1.0 55.72 perf-profile.children.cycles-pp.osq_lock
2.02 ? 5% +1.3 3.30 ? 10% perf-profile.children.cycles-pp.intel_idle_irq
64.02 +1.6 65.62 perf-profile.children.cycles-pp.xlog_cil_force_seq
7.66 ? 2% +1.9 9.52 perf-profile.children.cycles-pp.intel_idle
55.82 +2.0 57.86 perf-profile.children.cycles-pp.__mutex_lock
58.14 +2.6 60.76 perf-profile.children.cycles-pp.__flush_workqueue
59.46 +2.7 62.18 perf-profile.children.cycles-pp.xlog_cil_push_now
9.81 ? 2% +3.2 12.99 ? 3% perf-profile.children.cycles-pp.cpuidle_enter_state
9.81 ? 2% +3.2 12.99 ? 3% perf-profile.children.cycles-pp.cpuidle_enter
9.98 ? 2% +3.2 13.22 ? 3% perf-profile.children.cycles-pp.cpuidle_idle_call
10.47 ? 2% +3.5 13.94 ? 2% perf-profile.children.cycles-pp.start_secondary
10.60 ? 2% +3.5 14.08 ? 2% perf-profile.children.cycles-pp.do_idle
10.60 ? 2% +3.5 14.08 ? 2% perf-profile.children.cycles-pp.common_startup_64
10.60 ? 2% +3.5 14.08 ? 2% perf-profile.children.cycles-pp.cpu_startup_entry
22.12 ? 2% -6.5 15.60 ? 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.08 ? 5% -0.0 0.07 ? 7% perf-profile.self.cycles-pp.xfs_log_ticket_ungrant
0.05 +0.0 0.06 ? 6% perf-profile.self.cycles-pp.finish_task_switch
0.05 ? 8% +0.0 0.07 ? 5% perf-profile.self.cycles-pp.llist_add_batch
0.06 ? 8% +0.0 0.07 ? 5% perf-profile.self.cycles-pp.mutex_unlock
0.06 ? 6% +0.0 0.08 perf-profile.self.cycles-pp.__switch_to
0.08 ? 5% +0.0 0.10 ? 4% perf-profile.self.cycles-pp.prepare_task_switch
0.06 ? 6% +0.0 0.08 ? 4% perf-profile.self.cycles-pp.try_to_wake_up
0.14 ? 2% +0.0 0.16 ? 5% perf-profile.self.cycles-pp.__schedule
0.18 ? 4% +0.0 0.20 ? 3% perf-profile.self.cycles-pp.available_idle_cpu
0.07 ? 10% +0.0 0.10 ? 5% perf-profile.self.cycles-pp.menu_select
0.37 +0.0 0.41 perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.12 ? 6% +0.0 0.16 ? 3% perf-profile.self.cycles-pp.flush_workqueue_prep_pwqs
0.23 ? 2% +0.0 0.28 ? 3% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.00 +0.1 0.05 perf-profile.self.cycles-pp.vfs_write
0.08 ? 4% +0.1 0.14 ? 6% perf-profile.self.cycles-pp.__mutex_lock
0.00 +0.1 0.06 ? 13% perf-profile.self.cycles-pp.poll_idle
0.36 ? 2% +0.1 0.45 ? 3% perf-profile.self.cycles-pp._raw_spin_lock
1.01 ? 2% +0.1 1.12 ? 2% perf-profile.self.cycles-pp.copy_to_brd
0.90 ? 2% +0.7 1.57 perf-profile.self.cycles-pp.mutex_spin_on_owner
54.26 +1.0 55.30 perf-profile.self.cycles-pp.osq_lock
1.96 ? 5% +1.3 3.22 ? 10% perf-profile.self.cycles-pp.intel_idle_irq
7.66 ? 2% +1.9 9.52 perf-profile.self.cycles-pp.intel_idle



***************************************************************************************************
lkp-csl-2sp3: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
=========================================================================================
build_kconfig/compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/target/tbox_group/testcase:
defconfig/gcc-13/performance/x86_64-rhel-8.3/200%/debian-12-x86_64-20240206.cgz/300s/vmlinux/lkp-csl-2sp3/kbuild

commit:
97450eb909 ("sched/pelt: Remove shift of thermal clock")
e2bbd1c498 ("sched/fair: Reschedule the cfs_rq when current is ineligible")

97450eb909658573 e2bbd1c498980c5cb68f9973f41
---------------- ---------------------------
%stddev %change %stddev
\ | \
1944235 -11.3% 1724001 cpuidle..usage
0.07 -0.0 0.07 mpstat.cpu.all.soft%
18920 +173.4% 51728 vmstat.system.cs
66019 +2.3% 67537 vmstat.system.in
238928 ? 4% +18.7% 283496 ? 6% numa-meminfo.node1.Active
238928 ? 4% +18.7% 283496 ? 6% numa-meminfo.node1.Active(anon)
252057 ? 4% +19.9% 302260 ? 6% numa-meminfo.node1.Shmem
59680 ? 4% +18.6% 70769 ? 6% numa-vmstat.node1.nr_active_anon
62962 ? 4% +20.0% 75559 ? 6% numa-vmstat.node1.nr_shmem
59680 ? 4% +18.6% 70769 ? 6% numa-vmstat.node1.nr_zone_active_anon
40.33 ? 16% +46.7% 59.17 ? 17% perf-c2c.DRAM.remote
99.50 ? 7% +36.3% 135.67 ? 10% perf-c2c.HITM.local
21.67 ? 22% +85.4% 40.17 ? 18% perf-c2c.HITM.remote
266436 +22.5% 326455 meminfo.Active
266436 +22.5% 326455 meminfo.Active(anon)
71309 +19.2% 85005 meminfo.Mapped
284474 +23.8% 352101 meminfo.Shmem
50.65 +1.4% 51.35 kbuild.buildtime_per_iteration
50.65 +1.4% 51.35 kbuild.real_time_per_iteration
162.16 +1.3% 164.26 kbuild.sys_time_per_iteration
4290772 +259.3% 15415002 kbuild.time.involuntary_context_switches
5389 +1.0% 5444 kbuild.time.percent_of_cpu_this_job_got
983.50 +1.3% 996.08 kbuild.time.system_time
17417 +2.3% 17819 kbuild.time.user_time
2898 +2.3% 2965 kbuild.user_time_per_iteration
66625 +22.7% 81776 proc-vmstat.nr_active_anon
1491011 +2.1% 1522533 proc-vmstat.nr_anon_pages
1496099 +2.2% 1529304 proc-vmstat.nr_inactive_anon
18140 +18.3% 21461 proc-vmstat.nr_mapped
9585 +1.5% 9732 proc-vmstat.nr_page_table_pages
71184 +23.7% 88073 proc-vmstat.nr_shmem
66625 +22.7% 81776 proc-vmstat.nr_zone_active_anon
1496099 +2.2% 1529304 proc-vmstat.nr_zone_inactive_anon
64689 +7.4% 69486 proc-vmstat.numa_huge_pte_updates
33355254 +7.4% 35829277 proc-vmstat.numa_pte_updates
377789 +4.4% 394459 proc-vmstat.pgactivate
2.67 +0.1 2.76 perf-stat.i.branch-miss-rate%
7.979e+08 +3.1% 8.224e+08 perf-stat.i.branch-misses
27.01 -1.2 25.84 perf-stat.i.cache-miss-rate%
2.753e+08 +3.3% 2.843e+08 perf-stat.i.cache-misses
8.884e+08 +11.4% 9.895e+08 perf-stat.i.cache-references
18965 +174.5% 52066 perf-stat.i.context-switches
1.08 +1.5% 1.09 perf-stat.i.cpi
1.538e+11 +1.2% 1.557e+11 perf-stat.i.cpu-cycles
702.89 -5.5% 664.26 perf-stat.i.cpu-migrations
1.05 -1.0% 1.04 perf-stat.i.ipc
12.50 -10.1% 11.23 ? 2% perf-stat.i.major-faults
329354 -1.1% 325687 perf-stat.i.minor-faults
329366 -1.1% 325698 perf-stat.i.page-faults
2.30 +4.3% 2.40 perf-stat.overall.MPKI
3.07 +0.1 3.20 perf-stat.overall.branch-miss-rate%
30.99 -2.2 28.74 perf-stat.overall.cache-miss-rate%
1.28 +2.2% 1.31 perf-stat.overall.cpi
558.80 -2.0% 547.74 perf-stat.overall.cycles-between-cache-misses
0.78 -2.2% 0.76 perf-stat.overall.ipc
7.966e+08 +2.9% 8.199e+08 perf-stat.ps.branch-misses
2.748e+08 +3.2% 2.835e+08 perf-stat.ps.cache-misses
8.869e+08 +11.2% 9.865e+08 perf-stat.ps.cache-references
18918 +174.4% 51908 perf-stat.ps.context-switches
1.536e+11 +1.1% 1.553e+11 perf-stat.ps.cpu-cycles
701.06 -5.6% 662.09 perf-stat.ps.cpu-migrations
1.197e+11 -1.1% 1.183e+11 perf-stat.ps.instructions
12.47 -10.2% 11.20 ? 2% perf-stat.ps.major-faults
328614 -1.2% 324786 perf-stat.ps.minor-faults
328627 -1.2% 324797 perf-stat.ps.page-faults
0.73 -45.9% 0.40 sched_debug.cfs_rq:/.h_nr_running.avg
2.06 ? 6% -23.0% 1.58 ? 8% sched_debug.cfs_rq:/.h_nr_running.max
0.33 -50.0% 0.17 sched_debug.cfs_rq:/.h_nr_running.min
0.38 ? 6% -18.2% 0.31 ? 3% sched_debug.cfs_rq:/.h_nr_running.stddev
1689 -48.8% 864.97 sched_debug.cfs_rq:/.load.min
1.61 ? 4% -48.3% 0.83 sched_debug.cfs_rq:/.load_avg.min
0.38 ? 2% -41.7% 0.22 sched_debug.cfs_rq:/.nr_running.avg
0.33 -50.0% 0.17 sched_debug.cfs_rq:/.nr_running.min
0.16 ? 8% +19.4% 0.20 ? 3% sched_debug.cfs_rq:/.nr_running.stddev
741.02 ? 2% -43.4% 419.70 sched_debug.cfs_rq:/.runnable_avg.avg
1841 ? 4% -25.7% 1368 ? 2% sched_debug.cfs_rq:/.runnable_avg.max
310.50 ? 7% -49.2% 157.89 ? 7% sched_debug.cfs_rq:/.runnable_avg.min
328.61 ? 6% -21.0% 259.55 ? 3% sched_debug.cfs_rq:/.runnable_avg.stddev
404.14 ? 3% -38.5% 248.50 ? 2% sched_debug.cfs_rq:/.util_avg.avg
181.06 ? 13% -51.2% 88.44 ? 14% sched_debug.cfs_rq:/.util_avg.min
64.30 ? 8% -18.5% 52.39 ? 2% sched_debug.cfs_rq:/.util_est.avg
92.86 ? 13% -24.6% 70.04 ? 4% sched_debug.cfs_rq:/.util_est.stddev
720896 +17.0% 843275 ? 3% sched_debug.cpu.avg_idle.avg
11.27 ? 12% -39.6% 6.81 ? 5% sched_debug.cpu.clock.stddev
52931 -42.2% 30600 ? 2% sched_debug.cpu.curr->pid.avg
7508 ? 22% +67.6% 12586 ? 11% sched_debug.cpu.curr->pid.stddev
0.73 -45.8% 0.40 ? 2% sched_debug.cpu.nr_running.avg
0.33 -50.0% 0.17 sched_debug.cpu.nr_running.min
0.38 ? 9% -18.5% 0.31 ? 5% sched_debug.cpu.nr_running.stddev
30375 +163.9% 80145 sched_debug.cpu.nr_switches.avg
42800 ? 4% +120.8% 94514 ? 2% sched_debug.cpu.nr_switches.max
25282 +190.1% 73354 sched_debug.cpu.nr_switches.min
-107.64 -18.1% -88.19 sched_debug.cpu.nr_uninterruptible.min
36.76 ? 4% -17.1% 30.49 ? 5% sched_debug.cpu.nr_uninterruptible.stddev
4.63 ? 6% -1.9 2.69 ? 8% perf-profile.calltrace.cycles-pp.common_startup_64
4.57 ? 7% -1.9 2.66 ? 9% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.common_startup_64
4.57 ? 7% -1.9 2.66 ? 9% perf-profile.calltrace.cycles-pp.start_secondary.common_startup_64
4.57 ? 7% -1.9 2.66 ? 9% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.common_startup_64
4.55 ? 7% -1.9 2.65 ? 9% perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.common_startup_64
4.46 ? 7% -1.8 2.62 ? 9% perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
4.42 ? 7% -1.8 2.61 ? 9% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
3.97 ? 7% -1.5 2.44 ? 9% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
0.64 ? 2% -0.1 0.59 ? 3% perf-profile.calltrace.cycles-pp.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
1.40 ? 2% +0.1 1.46 perf-profile.calltrace.cycles-pp.open64
1.00 ? 2% +0.1 1.06 ? 2% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe
0.99 ? 2% +0.1 1.06 ? 2% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.81 +0.1 1.92 perf-profile.calltrace.cycles-pp.malloc
0.72 ? 4% +0.1 0.87 ? 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt
0.35 ? 70% +0.2 0.56 ? 2% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
4.63 ? 6% -1.9 2.69 ? 8% perf-profile.children.cycles-pp.common_startup_64
4.63 ? 6% -1.9 2.69 ? 8% perf-profile.children.cycles-pp.cpu_startup_entry
4.63 ? 6% -1.9 2.69 ? 8% perf-profile.children.cycles-pp.do_idle
4.61 ? 6% -1.9 2.69 ? 8% perf-profile.children.cycles-pp.cpuidle_idle_call
4.57 ? 7% -1.9 2.66 ? 9% perf-profile.children.cycles-pp.start_secondary
4.51 ? 6% -1.9 2.65 ? 8% perf-profile.children.cycles-pp.cpuidle_enter
4.51 ? 6% -1.9 2.65 ? 8% perf-profile.children.cycles-pp.cpuidle_enter_state
4.01 ? 6% -1.5 2.47 ? 8% perf-profile.children.cycles-pp.intel_idle
1.92 ? 3% -0.2 1.76 ? 2% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
1.56 ? 3% -0.1 1.48 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
0.27 ? 6% -0.1 0.21 ? 5% perf-profile.children.cycles-pp.irq_exit_rcu
0.07 ? 10% -0.0 0.05 ? 8% perf-profile.children.cycles-pp.free_pcppages_bulk
0.11 ? 4% -0.0 0.09 ? 4% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
0.07 ? 6% -0.0 0.06 perf-profile.children.cycles-pp.perf_rotate_context
0.25 +0.0 0.26 perf-profile.children.cycles-pp.ggc_free(void*)
0.08 +0.0 0.09 ? 4% perf-profile.children.cycles-pp.dput
0.25 +0.0 0.27 perf-profile.children.cycles-pp._cpp_pop_context
0.27 ? 3% +0.0 0.29 ? 2% perf-profile.children.cycles-pp.mmap_region
0.08 ? 10% +0.0 0.10 ? 3% perf-profile.children.cycles-pp.__split_vma
0.13 ? 5% +0.0 0.15 ? 3% perf-profile.children.cycles-pp.do_vmi_align_munmap
0.12 ? 4% +0.0 0.14 ? 3% perf-profile.children.cycles-pp.ksys_mmap_pgoff
0.32 ? 3% +0.0 0.34 perf-profile.children.cycles-pp.mark_exp_read(tree_node*)
0.12 ? 5% +0.0 0.14 perf-profile.children.cycles-pp.do_vmi_munmap
0.11 ? 4% +0.0 0.14 ? 3% perf-profile.children.cycles-pp.update_load_avg
0.11 ? 6% +0.0 0.13 ? 5% perf-profile.children.cycles-pp.next_uptodate_folio
0.20 ? 4% +0.0 0.23 ? 3% perf-profile.children.cycles-pp.filemap_map_pages
0.35 +0.0 0.37 ? 2% perf-profile.children.cycles-pp.walk_component
0.32 ? 3% +0.0 0.35 ? 2% perf-profile.children.cycles-pp.do_mmap
0.34 ? 3% +0.0 0.37 ? 2% perf-profile.children.cycles-pp.vm_mmap_pgoff
0.29 ? 3% +0.0 0.32 ? 3% perf-profile.children.cycles-pp.lookup_name(tree_node*)
0.22 ? 4% +0.0 0.25 ? 3% perf-profile.children.cycles-pp.do_read_fault
0.05 +0.0 0.08 ? 5% perf-profile.children.cycles-pp.smpboot_thread_fn
0.27 ? 4% +0.0 0.30 ? 3% perf-profile.children.cycles-pp.do_fault
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__rseq_handle_notify_resume
0.00 +0.1 0.05 perf-profile.children.cycles-pp.__update_load_avg_se
1.20 ? 2% +0.1 1.26 perf-profile.children.cycles-pp.path_openat
1.41 ? 2% +0.1 1.47 perf-profile.children.cycles-pp.open64
0.00 +0.1 0.06 ? 7% perf-profile.children.cycles-pp.run_ksoftirqd
1.21 ? 2% +0.1 1.28 perf-profile.children.cycles-pp.do_filp_open
1.42 ? 2% +0.1 1.49 perf-profile.children.cycles-pp.__x64_sys_openat
1.41 ? 2% +0.1 1.48 perf-profile.children.cycles-pp.do_sys_openat2
1.84 +0.1 1.95 perf-profile.children.cycles-pp.malloc
0.00 +0.1 0.11 ? 4% perf-profile.children.cycles-pp.pick_next_task_fair
4.32 +0.1 4.43 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
4.30 +0.1 4.42 perf-profile.children.cycles-pp.do_syscall_64
0.13 ? 5% +0.2 0.30 ? 3% perf-profile.children.cycles-pp.__schedule
0.12 ? 7% +0.2 0.30 ? 4% perf-profile.children.cycles-pp.schedule
0.20 ? 4% +0.2 0.43 perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
4.01 ? 6% -1.5 2.47 ? 8% perf-profile.self.cycles-pp.intel_idle
0.09 ? 9% +0.0 0.11 ? 4% perf-profile.self.cycles-pp.next_uptodate_folio
0.28 ? 3% +0.0 0.31 ? 3% perf-profile.self.cycles-pp.lookup_name(tree_node*)
1.71 +0.1 1.80 perf-profile.self.cycles-pp.malloc





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


2024-06-03 02:58:24

by Honglei Wang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



On 2024/5/29 22:31, Chunxin Zang wrote:
>
>
>> On May 25, 2024, at 19:48, Honglei Wang <[email protected]> wrote:
>>
>>
>>
>> On 2024/5/24 21:40, Chunxin Zang wrote:
>>> I found that some tasks have been running for a long enough time and
>>> have become illegal, but they are still not releasing the CPU. This
>>> will increase the scheduling delay of other processes. Therefore, I
>>> tried checking the current process in wakeup_preempt and entity_tick,
>>> and if it is illegal, reschedule that cfs queue.
>>> The modification can reduce the scheduling delay by about 30% when
>>> RUN_TO_PARITY is enabled.
>>> So far, it has been running well in my test environment, and I have
>>> pasted some test results below.
>>> I isolated four cores for testing. I ran Hackbench in the background
>>> and observed the test results of cyclictest.
>>> hackbench -g 4 -l 100000000 &
>>> cyclictest --mlockall -D 5m -q
>>> EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY
>>> # Min Latencies: 00006 00006 00006 00006
>>> LNICE(-19) # Avg Latencies: 00191 00122 00089 00066
>>> # Max Latencies: 15442 07648 14133 07713
>>> # Min Latencies: 00006 00010 00006 00006
>>> LNICE(0) # Avg Latencies: 00466 00277 00289 00257
>>> # Max Latencies: 38917 32391 32665 17710
>>> # Min Latencies: 00019 00053 00010 00013
>>> LNICE(19) # Avg Latencies: 37151 31045 18293 23035
>>> # Max Latencies: 2688299 7031295 426196 425708
>>> I'm actually a bit hesitant about placing this modification under the
>>> NO_PARITY feature. This is because the modification conflicts with the
>>> semantics of RUN_TO_PARITY. So, I captured and compared the number of
>>> resched occurrences in wakeup_preempt to see if it introduced any
>>> additional overhead.
>>> Similarly, hackbench is used to stress the utilization of four cores to
>>> 100%, and the method for capturing the number of PREEMPT occurrences is
>>> referenced from [1].
>>> schedstats EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY CFS(6.5)
>>> stats.check_preempt_count 5053054 5057286 5003806 5018589 5031908
>>> stats.patch_cause_preempt_count ------- 858044 ------- 765726 -------
>>> stats.need_preempt_count 570520 858684 3380513 3426977 1140821
>>> From the above test results, there is a slight increase in the number of
>>> resched occurrences in wakeup_preempt. However, the results vary with each
>>> test, and sometimes the difference is not that significant. But overall,
>>> the count of reschedules remains lower than that of CFS and is much less
>>> than that of NO_PARITY.
>>> [1]: https://lore.kernel.org/all/[email protected]/T/#m52057282ceb6203318be1ce9f835363de3bef5cb
>>> Signed-off-by: Chunxin Zang <[email protected]>
>>> Reviewed-by: Chen Yang <[email protected]>
>>> ---
>>> kernel/sched/fair.c | 6 ++++++
>>> 1 file changed, 6 insertions(+)
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>> return;
>>> #endif
>>> +
>>> + if (!entity_eligible(cfs_rq, curr))
>>> + resched_curr(rq_of(cfs_rq));
>>> }
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>> find_matching_se(&se, &pse);
>>> WARN_ON_ONCE(!pse);
>>>
>> Hi Chunxin,
>>
>> Did you run a comparative test to see which modification is more helpful on improve the latency? Modification at tick point makes more sense to me. But, seems just resched arbitrarily in wakeup might introduce too much preemption (and maybe more context switch?) in complex environment such as cgroup hierarchy.
>>
>> Thanks,
>> Honglei
>
> Hi Honglei
>
> I attempted to build a slightly more complex scenario. It consists of 4 isolated cores,
> 4 groups of hackbench (160 processes in total) to stress the CPU, and 1 cyclictest
> process to test scheduling latency. Using cgroup v2, to created 64 cgroup leaf nodes
> in a binary tree structure (with a depth of 7). I then evenly distributed the aforementioned
> 161 processes across the 64 cgroups respectively, and observed the scheduling delay
> performance of cyclictest.
>
> Unfortunately, the test results were very fluctuating, and the two sets of data were very
> close to each other. I suspect that it might be due to too few processes being distributed
> in each cgroup, which led to the logic for determining ineligible always succeeding and
> following the original logic. Later, I will attempt more tests to verify the impact of these
> modifications in scenarios involving multiple cgroups.
>

Sorry to lately replay, I was a bit busy last week. How's the test going
on? What about run some workload processes who spend more time in
kernel? Maybe it's worth do give a try, but it depends on your test plan.

Thanks,
Honglei

> thanks
> Chunxin
>
>


2024-06-05 17:20:09

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

Hi Prateek, Chunxin,

On 2024-05-28 at 10:32:23 +0530, K Prateek Nayak wrote:
> Hello Chunxin,
>
> On 5/28/2024 8:12 AM, Chunxin Zang wrote:
> >
> >> On May 24, 2024, at 23:30, Chen Yu <[email protected]> wrote:
> >>
> >> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
> >>> I found that some tasks have been running for a long enough time and
> >>> have become illegal, but they are still not releasing the CPU. This
> >>> will increase the scheduling delay of other processes. Therefore, I
> >>> tried checking the current process in wakeup_preempt and entity_tick,
> >>> and if it is illegal, reschedule that cfs queue.
> >>>
> >>> The modification can reduce the scheduling delay by about 30% when
> >>> RUN_TO_PARITY is enabled.
> >>> So far, it has been running well in my test environment, and I have
> >>> pasted some test results below.
> >>>
> >>
> >> Interesting, besides hackbench, I assume that you have workload in
> >> real production environment that is sensitive to wakeup latency?
> >
> > Hi Chen
> >
> > Yes, my workload are quite sensitive to wakeup latency .
> >>
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 03be0d1330a6..a0005d240db5 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
> >>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
> >>> return;
> >>> #endif
> >>> +
> >>> + if (!entity_eligible(cfs_rq, curr))
> >>> + resched_curr(rq_of(cfs_rq));
> >>> }
> >>>
> >>
> >> entity_tick() -> update_curr() -> update_deadline():
> >> se->vruntime >= se->deadline ? resched_curr()
> >> only current has expired its slice will it be scheduled out.
> >>
> >> So here you want to schedule current out if its lag becomes 0.
> >>
> >> In lastest sched/eevdf branch, it is controlled by two sched features:
> >> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
> >> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
> >> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
> >>
> >> Maybe something like this can achieve your goal
> >> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
> >> resched_curr
> >>
> >>>
> >>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> >>> return;
> >>>
> >>> + if (!entity_eligible(cfs_rq, se))
> >>> + goto preempt;
> >>> +
> >>
> >> Not sure if this is applicable, later in this function, pick_eevdf() checks
> >> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
> >> be evicted. And this change does not consider the cgroup hierarchy.
>
> The above line will be referred to as [1] below.
>
> >>
> >> Besides, the check of current eligiblity can get false negative result,
> >> if the enqueued entity has a positive lag. Prateek proposed to
> >> remove the check of current's eligibility in pick_eevdf():
> >> https://lore.kernel.org/lkml/[email protected]/
> >
> > Thank you for letting me know about Peter's latest updates and thoughts.
> > Actually, the original intention of my modification was to minimize the
> > traversal of the rb-tree as much as possible. For example, in the following
> > scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
> > 'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
> > resched, the scheduler will call 'pick_eevdf' again, traversing the
> > rb-tree once more. This ultimately results in the rb-tree being traversed
> > twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
> > and directly trigger a 'resched', it would reduce the traversal of the rb-tree
> > by one time.
> >
> >
> > wakeup_preempt-> pick_eevdf -> resched_curr
> > |->'traverse the rb-tree' |
> > schedule->pick_eevdf
> > |->'traverse the rb-tree'
>
> I see what you mean but a couple of things:
>
> (I'm adding the check_preempt_wakeup_fair() hunk from the original patch
> below for ease of interpretation)
>
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 03be0d1330a6..a0005d240db5 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> > if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
> > return;
> >
> > + if (!entity_eligible(cfs_rq, se))
> > + goto preempt;
> > +
>
> This check uses the root cfs_rq since "task_cfs_rq()" returns the
> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
> on a higher order cfs_rq and this entity_eligible() calculation might
> not be valid since the vruntime calculation for the "se" is relative to
> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
> I believe that is what Chenyu was referring to in [1].
>

Sorry for the late reply and thanks for help clarify this. Yes, this is
what my previous concern was:
1. It does not consider the cgroup and does not check preemption in the same
level which is covered by find_matching_se().
2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
later pick_eevdf() will check the eligible of current anyway. But
as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
I just wonder if we could leverage the cfs_rq->next to store the next
candidate, so it can be picked directly in the 2nd pick as a fast path?
Something like below untested:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..f716646d595e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
{
struct task_struct *curr = rq->curr;
- struct sched_entity *se = &curr->se, *pse = &p->se;
+ struct sched_entity *se = &curr->se, *pse = &p->se, *next;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int cse_is_idle, pse_is_idle;

@@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
/*
* XXX pick_eevdf(cfs_rq) != se ?
*/
- if (pick_eevdf(cfs_rq) == pse)
+ next = pick_eevdf(cfs_rq);
+ if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
+ set_next_buddy(next);
+
+ if (next == pse)
goto preempt;

return;


thanks,
Chenyu

> > find_matching_se(&se, &pse);
> > WARN_ON_ONCE(!pse);
> >
> > --
>
> In addition to that, There is an update_curr() call below for the first
> cfs_rq where both the entities' hierarchy is queued which is found by
> find_matching_se(). I believe that is required too to update the
> vruntime and deadline of the entity where preemption can happen.
>
> If you want to circumvent a second call to pick_eevdf(), could you
> perhaps do:
>
> (Only build tested)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9eb63573110c..653b1bee1e62 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> update_curr(cfs_rq);
>
> /*
> - * XXX pick_eevdf(cfs_rq) != se ?
> + * If the hierarchy of current task is ineligible at the common
> + * point on the newly woken entity, there is a good chance of
> + * wakeup preemption by the newly woken entity. Mark for resched
> + * and allow pick_eevdf() in schedule() to judge which task to
> + * run next.
> */
> - if (pick_eevdf(cfs_rq) == pse)
> + if (!entity_eligible(cfs_rq, se))
> goto preempt;
>
> return;
>
> --
>
> There are other implications here which is specifically highlighted by
> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
> entity is not the entity with the earliest eligible virtual deadline,
> the current task is still preempted if any other entity has the EEVD.
>
> Mike's box gave switching to above two thumbs up; I have to check what
> my box says :)
>
> Following are DeathStarBench results with your original patch compared
> to v6.9-rc5 based tip:sched/core:
>
> ==================================================================
> Test : DeathStarBench
> Why? : Some tasks here do no like aggressive preemption
> Units : Normalized throughput
> Interpretation: Higher is better
> Statistic : Mean
> ==================================================================
> Pinning scaling tip eager_preempt (pct imp)
> 1CCD 1 1.00 0.99 (%diff: -1.13%)
> 2CCD 2 1.00 0.97 (%diff: -3.21%)
> 4CCD 3 1.00 0.97 (%diff: -3.41%)
> 8CCD 6 1.00 0.97 (%diff: -3.20%)
> --
>
> I'll give the variants mentioned in the thread a try too to see if
> some of my assumptions around heavy preemption hold good. I was also
> able to dig up an old patch by Balakumaran Kannan which skipped
> pick_eevdf() altogether if "pse" is ineligible which also seems like
> a good optimization based on current check in
> check_preempt_wakeup_fair() but it perhaps doesn't help the case of
> wakeup-latency sensitivity you are optimizing for; only reduces
> rb-tree traversal if there is no chance of pick_eevdf() returning "pse"
> https://lore.kernel.org/lkml/[email protected]/
>
> --
> Thanks and Regards,
> Prateek
>
> >
> >
> > Of course, this would break the semantics of RESPECT_SLICE as well as
> > RUN_TO_PARITY. So, this might be considered a performance enhancement
> > for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
> >
> > thanks
> > Chunxin
> >
> >
> >> If I understand your requirement correctly, you want to reduce the wakeup
> >> latency. There are some codes under developed by Peter, which could
> >> customized task's wakeup latency via setting its slice:
> >> https://lore.kernel.org/lkml/[email protected]/
> >>
> >> thanks,
> >> Chenyu
>

2024-06-06 01:47:25

by Chunxin Zang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



> On Jun 6, 2024, at 01:19, Chen Yu <[email protected]> wrote:
>
> Hi Prateek, Chunxin,
>
> On 2024-05-28 at 10:32:23 +0530, K Prateek Nayak wrote:
>> Hello Chunxin,
>>
>> On 5/28/2024 8:12 AM, Chunxin Zang wrote:
>>>
>>>> On May 24, 2024, at 23:30, Chen Yu <[email protected]> wrote:
>>>>
>>>> On 2024-05-24 at 21:40:11 +0800, Chunxin Zang wrote:
>>>>> I found that some tasks have been running for a long enough time and
>>>>> have become illegal, but they are still not releasing the CPU. This
>>>>> will increase the scheduling delay of other processes. Therefore, I
>>>>> tried checking the current process in wakeup_preempt and entity_tick,
>>>>> and if it is illegal, reschedule that cfs queue.
>>>>>
>>>>> The modification can reduce the scheduling delay by about 30% when
>>>>> RUN_TO_PARITY is enabled.
>>>>> So far, it has been running well in my test environment, and I have
>>>>> pasted some test results below.
>>>>>
>>>>
>>>> Interesting, besides hackbench, I assume that you have workload in
>>>> real production environment that is sensitive to wakeup latency?
>>>
>>> Hi Chen
>>>
>>> Yes, my workload are quite sensitive to wakeup latency .
>>>>
>>>>>
>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>>> index 03be0d1330a6..a0005d240db5 100644
>>>>> --- a/kernel/sched/fair.c
>>>>> +++ b/kernel/sched/fair.c
>>>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>>>> return;
>>>>> #endif
>>>>> +
>>>>> + if (!entity_eligible(cfs_rq, curr))
>>>>> + resched_curr(rq_of(cfs_rq));
>>>>> }
>>>>>
>>>>
>>>> entity_tick() -> update_curr() -> update_deadline():
>>>> se->vruntime >= se->deadline ? resched_curr()
>>>> only current has expired its slice will it be scheduled out.
>>>>
>>>> So here you want to schedule current out if its lag becomes 0.
>>>>
>>>> In lastest sched/eevdf branch, it is controlled by two sched features:
>>>> RESPECT_SLICE: Inhibit preemption until the current task has exhausted it's slice.
>>>> RUN_TO_PARITY: Relax RESPECT_SLICE and only protect current until 0-lag.
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=sched/eevdf&id=e04f5454d68590a239092a700e9bbaf84270397c
>>>>
>>>> Maybe something like this can achieve your goal
>>>> if (sched_feat(RUN_TOPARITY) && !entity_eligible(cfs_rq, curr))
>>>> resched_curr
>>>>
>>>>>
>>>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>>>> return;
>>>>>
>>>>> + if (!entity_eligible(cfs_rq, se))
>>>>> + goto preempt;
>>>>> +
>>>>
>>>> Not sure if this is applicable, later in this function, pick_eevdf() checks
>>>> if the current is eligible, !entity_eligible(cfs_rq, curr), if not, curr will
>>>> be evicted. And this change does not consider the cgroup hierarchy.
>>
>> The above line will be referred to as [1] below.
>>
>>>>
>>>> Besides, the check of current eligiblity can get false negative result,
>>>> if the enqueued entity has a positive lag. Prateek proposed to
>>>> remove the check of current's eligibility in pick_eevdf():
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> Thank you for letting me know about Peter's latest updates and thoughts.
>>> Actually, the original intention of my modification was to minimize the
>>> traversal of the rb-tree as much as possible. For example, in the following
>>> scenario, if 'curr' is ineligible, the system would still traverse the rb-tree in
>>> 'pick_eevdf' to return an optimal 'se', and then trigger 'resched_curr'. After
>>> resched, the scheduler will call 'pick_eevdf' again, traversing the
>>> rb-tree once more. This ultimately results in the rb-tree being traversed
>>> twice. If it's possible to determine that 'curr' is ineligible within 'wakeup_preempt'
>>> and directly trigger a 'resched', it would reduce the traversal of the rb-tree
>>> by one time.
>>>
>>>
>>> wakeup_preempt-> pick_eevdf -> resched_curr
>>> |->'traverse the rb-tree' |
>>> schedule->pick_eevdf
>>> |->'traverse the rb-tree'
>>
>> I see what you mean but a couple of things:
>>
>> (I'm adding the check_preempt_wakeup_fair() hunk from the original patch
>> below for ease of interpretation)
>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 03be0d1330a6..a0005d240db5 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>> return;
>>>
>>> + if (!entity_eligible(cfs_rq, se))
>>> + goto preempt;
>>> +
>>
>> This check uses the root cfs_rq since "task_cfs_rq()" returns the
>> "rq->cfs" of the runqueue the task is on. In presence of cgroups or
>> CONFIG_SCHED_AUTOGROUP, there is a good chance this the task is queued
>> on a higher order cfs_rq and this entity_eligible() calculation might
>> not be valid since the vruntime calculation for the "se" is relative to
>> the "cfs_rq" where it is queued on. Please correct me if I'm wrong but
>> I believe that is what Chenyu was referring to in [1].
>>
>
> Sorry for the late reply and thanks for help clarify this. Yes, this is
> what my previous concern was:
> 1. It does not consider the cgroup and does not check preemption in the same
> level which is covered by find_matching_se().
> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> later pick_eevdf() will check the eligible of current anyway. But
> as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> I just wonder if we could leverage the cfs_rq->next to store the next
> candidate, so it can be picked directly in the 2nd pick as a fast path?
> Something like below untested:
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..f716646d595e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> {
> struct task_struct *curr = rq->curr;
> - struct sched_entity *se = &curr->se, *pse = &p->se;
> + struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> int cse_is_idle, pse_is_idle;
>
> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> /*
> * XXX pick_eevdf(cfs_rq) != se ?
> */
> - if (pick_eevdf(cfs_rq) == pse)
> + next = pick_eevdf(cfs_rq);
> + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> + set_next_buddy(next);
> +
> + if (next == pse)
> goto preempt;
>
> return;
>
>
> thanks,
> Chenyu

Hi Chen

First of all, thank you for your patient response. Regarding the issue of avoiding traversing
the RB-tree twice, I initially had two methods in mind.
1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
This idea is similar to the one you proposed this time.
2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.'
Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
process to schedule' are two different things. 'check_preempt_wakeup_fair' is not just to
check if the newly awakened process should preempt the current process; it can also serve
as an opportunity to check whether any other processes should preempt the current one,
thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
then the current process will still not be preempted. Therefore, I posted the v2 PATCH.
The implementation of v2 PATCH might express this point more clearly.
https://lore.kernel.org/lkml/[email protected]/T/

I previously implemented and tested both of these methods, and the test results showed that
method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
think about it, perhaps method 1 could also be viable at the same time. :)

thanks
Chunixn

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..f67894d8fbc8 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -563,6 +563,8 @@ static inline s64 entity_key(struct cfs_rq *cfs_rq, struct sched_entity *se)
return (s64)(se->vruntime - cfs_rq->min_vruntime);
}

+static void unset_pick_cached(struct cfs_rq *cfs_rq);
+
#define __node_2_se(node) \
rb_entry((node), struct sched_entity, run_node)

@@ -632,6 +634,8 @@ avg_vruntime_add(struct cfs_rq *cfs_rq, struct sched_entity *se)

cfs_rq->avg_vruntime += key * weight;
cfs_rq->avg_load += weight;
+
+ unset_pick_cached(cfs_rq);
}

static void
@@ -642,6 +646,8 @@ avg_vruntime_sub(struct cfs_rq *cfs_rq, struct sched_entity *se)

cfs_rq->avg_vruntime -= key * weight;
cfs_rq->avg_load -= weight;
+
+ unset_pick_cached(cfs_rq);
}

static inline
@@ -651,6 +657,8 @@ void avg_vruntime_update(struct cfs_rq *cfs_rq, s64 delta)
* v' = v + d ==> avg_vruntime' = avg_runtime - d*avg_load
*/
cfs_rq->avg_vruntime -= cfs_rq->avg_load * delta;
+
+ unset_pick_cached(cfs_rq);
}

/*
@@ -745,6 +753,36 @@ int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se)
return vruntime_eligible(cfs_rq, se->vruntime);
}

+static struct sched_entity *try_to_get_pick_cached(struct cfs_rq* cfs_rq)
+{
+ struct sched_entity *se;
+
+ se = cfs_rq->pick_cached;
+
+ return se == NULL ? NULL : (se->on_rq ? se : NULL);
+}
+
+static void unset_pick_cached(struct cfs_rq *cfs_rq)
+{
+ cfs_rq->pick_cached = NULL;
+}
+
+static void set_pick_cached(struct sched_entity *se)
+{
+ if (!se || !se->on_rq)
+ return;
+
+ cfs_rq_of(se)->pick_cached = se;
+}
+
static u64 __update_min_vruntime(struct cfs_rq *cfs_rq, u64 vruntime)
{
u64 min_vruntime = cfs_rq->min_vruntime;
@@ -856,6 +894,51 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
return __node_2_se(left);
}

+static struct sched_entity *__pick_eevdf(struct cfs_rq *cfs_rq)
+{
+ struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
+ struct sched_entity *se = __pick_first_entity(cfs_rq);
+ struct sched_entity *best = NULL;
+
+ /* Pick the leftmost entity if it's eligible */
+ if (se && entity_eligible(cfs_rq, se))
+ return se;
+
+ /* Heap search for the EEVD entity */
+ while (node) {
+ struct rb_node *left = node->rb_left;
+
+ /*
+ * Eligible entities in left subtree are always better
+ * choices, since they have earlier deadlines.
+ */
+ if (left && vruntime_eligible(cfs_rq,
+ __node_2_se(left)->min_vruntime)) {
+ node = left;
+ continue;
+ }
+
+ se = __node_2_se(node);
+
+ /*
+ * The left subtree either is empty or has no eligible
+ * entity, so check the current node since it is the one
+ * with earliest deadline that might be eligible.
+ */
+ if (entity_eligible(cfs_rq, se)) {
+ best = se;
+ break;
+ }
+
+ node = node->rb_right;
+ }
+
+ if (best)
+ set_pick_cached(best);
+
+ return best;
+}
+
/*
* Earliest Eligible Virtual Deadline First
*
@@ -877,7 +960,6 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
*/
static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
{
- struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
struct sched_entity *se = __pick_first_entity(cfs_rq);
struct sched_entity *curr = cfs_rq->curr;
struct sched_entity *best = NULL;
@@ -899,41 +981,13 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
if (sched_feat(RUN_TO_PARITY) && curr && curr->vlag == curr->deadline)
return curr;

- /* Pick the leftmost entity if it's eligible */
- if (se && entity_eligible(cfs_rq, se)) {
- best = se;
- goto found;
- }
+ best = try_to_get_pick_cached(cfs_rq);
+ if (best && !entity_eligible(cfs_rq, best))
+ best = NULL;

- /* Heap search for the EEVD entity */
- while (node) {
- struct rb_node *left = node->rb_left;
-
- /*
- * Eligible entities in left subtree are always better
- * choices, since they have earlier deadlines.
- */
- if (left && vruntime_eligible(cfs_rq,
- __node_2_se(left)->min_vruntime)) {
- node = left;
- continue;
- }
-
- se = __node_2_se(node);
+ if (!best)
+ best = __pick_eevdf(cfs_rq);

- /*
- * The left subtree either is empty or has no eligible
- * entity, so check the current node since it is the one
- * with earliest deadline that might be eligible.
- */
- if (entity_eligible(cfs_rq, se)) {
- best = se;
- break;
- }
-
- node = node->rb_right;
- }
-found:
if (!best || (curr && entity_before(curr, best)))
best = curr;

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d2242679239e..373241075449 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -597,6 +597,7 @@ struct cfs_rq {
*/
struct sched_entity *curr;
struct sched_entity *next;
+ struct sched_entity *pick_cached;

#ifdef CONFIG_SCHED_DEBUG
unsigned int nr_spread_over;
--
2.34.1


>
>>> find_matching_se(&se, &pse);
>>> WARN_ON_ONCE(!pse);
>>>
>>> --
>>
>> In addition to that, There is an update_curr() call below for the first
>> cfs_rq where both the entities' hierarchy is queued which is found by
>> find_matching_se(). I believe that is required too to update the
>> vruntime and deadline of the entity where preemption can happen.
>>
>> If you want to circumvent a second call to pick_eevdf(), could you
>> perhaps do:
>>
>> (Only build tested)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 9eb63573110c..653b1bee1e62 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -8407,9 +8407,13 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>> update_curr(cfs_rq);
>>
>> /*
>> - * XXX pick_eevdf(cfs_rq) != se ?
>> + * If the hierarchy of current task is ineligible at the common
>> + * point on the newly woken entity, there is a good chance of
>> + * wakeup preemption by the newly woken entity. Mark for resched
>> + * and allow pick_eevdf() in schedule() to judge which task to
>> + * run next.
>> */
>> - if (pick_eevdf(cfs_rq) == pse)
>> + if (!entity_eligible(cfs_rq, se))
>> goto preempt;
>>
>> return;
>>
>> --
>>
>> There are other implications here which is specifically highlighted by
>> the "XXX pick_eevdf(cfs_rq) != se ?" comment. If the current waking
>> entity is not the entity with the earliest eligible virtual deadline,
>> the current task is still preempted if any other entity has the EEVD.
>>
>> Mike's box gave switching to above two thumbs up; I have to check what
>> my box says :)
>>
>> Following are DeathStarBench results with your original patch compared
>> to v6.9-rc5 based tip:sched/core:
>>
>> ==================================================================
>> Test : DeathStarBench
>> Why? : Some tasks here do no like aggressive preemption
>> Units : Normalized throughput
>> Interpretation: Higher is better
>> Statistic : Mean
>> ==================================================================
>> Pinning scaling tip eager_preempt (pct imp)
>> 1CCD 1 1.00 0.99 (%diff: -1.13%)
>> 2CCD 2 1.00 0.97 (%diff: -3.21%)
>> 4CCD 3 1.00 0.97 (%diff: -3.41%)
>> 8CCD 6 1.00 0.97 (%diff: -3.20%)
>> --
>>
>> I'll give the variants mentioned in the thread a try too to see if
>> some of my assumptions around heavy preemption hold good. I was also
>> able to dig up an old patch by Balakumaran Kannan which skipped
>> pick_eevdf() altogether if "pse" is ineligible which also seems like
>> a good optimization based on current check in
>> check_preempt_wakeup_fair() but it perhaps doesn't help the case of
>> wakeup-latency sensitivity you are optimizing for; only reduces
>> rb-tree traversal if there is no chance of pick_eevdf() returning "pse"
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> --
>> Thanks and Regards,
>> Prateek
>>
>>>
>>>
>>> Of course, this would break the semantics of RESPECT_SLICE as well as
>>> RUN_TO_PARITY. So, this might be considered a performance enhancement
>>> for scenarios without NO_RESPECT_SLICE/NO_RUN_TO_PARITY.
>>>
>>> thanks
>>> Chunxin
>>>
>>>
>>>> If I understand your requirement correctly, you want to reduce the wakeup
>>>> latency. There are some codes under developed by Peter, which could
>>>> customized task's wakeup latency via setting its slice:
>>>> https://lore.kernel.org/lkml/[email protected]/
>>>>
>>>> thanks,
>>>> Chenyu



2024-06-06 12:40:17

by Chunxin Zang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



> On Jun 3, 2024, at 10:55, Honglei Wang <[email protected]> wrote:
>
>
>
> On 2024/5/29 22:31, Chunxin Zang wrote:
>>> On May 25, 2024, at 19:48, Honglei Wang <[email protected]> wrote:
>>>
>>>
>>>
>>> On 2024/5/24 21:40, Chunxin Zang wrote:
>>>> I found that some tasks have been running for a long enough time and
>>>> have become illegal, but they are still not releasing the CPU. This
>>>> will increase the scheduling delay of other processes. Therefore, I
>>>> tried checking the current process in wakeup_preempt and entity_tick,
>>>> and if it is illegal, reschedule that cfs queue.
>>>> The modification can reduce the scheduling delay by about 30% when
>>>> RUN_TO_PARITY is enabled.
>>>> So far, it has been running well in my test environment, and I have
>>>> pasted some test results below.
>>>> I isolated four cores for testing. I ran Hackbench in the background
>>>> and observed the test results of cyclictest.
>>>> hackbench -g 4 -l 100000000 &
>>>> cyclictest --mlockall -D 5m -q
>>>> EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY
>>>> # Min Latencies: 00006 00006 00006 00006
>>>> LNICE(-19) # Avg Latencies: 00191 00122 00089 00066
>>>> # Max Latencies: 15442 07648 14133 07713
>>>> # Min Latencies: 00006 00010 00006 00006
>>>> LNICE(0) # Avg Latencies: 00466 00277 00289 00257
>>>> # Max Latencies: 38917 32391 32665 17710
>>>> # Min Latencies: 00019 00053 00010 00013
>>>> LNICE(19) # Avg Latencies: 37151 31045 18293 23035
>>>> # Max Latencies: 2688299 7031295 426196 425708
>>>> I'm actually a bit hesitant about placing this modification under the
>>>> NO_PARITY feature. This is because the modification conflicts with the
>>>> semantics of RUN_TO_PARITY. So, I captured and compared the number of
>>>> resched occurrences in wakeup_preempt to see if it introduced any
>>>> additional overhead.
>>>> Similarly, hackbench is used to stress the utilization of four cores to
>>>> 100%, and the method for capturing the number of PREEMPT occurrences is
>>>> referenced from [1].
>>>> schedstats EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY CFS(6.5)
>>>> stats.check_preempt_count 5053054 5057286 5003806 5018589 5031908
>>>> stats.patch_cause_preempt_count ------- 858044 ------- 765726 -------
>>>> stats.need_preempt_count 570520 858684 3380513 3426977 1140821
>>>> From the above test results, there is a slight increase in the number of
>>>> resched occurrences in wakeup_preempt. However, the results vary with each
>>>> test, and sometimes the difference is not that significant. But overall,
>>>> the count of reschedules remains lower than that of CFS and is much less
>>>> than that of NO_PARITY.
>>>> [1]: https://lore.kernel.org/all/[email protected]/T/#m52057282ceb6203318be1ce9f835363de3bef5cb
>>>> Signed-off-by: Chunxin Zang <[email protected]>
>>>> Reviewed-by: Chen Yang <[email protected]>
>>>> ---
>>>> kernel/sched/fair.c | 6 ++++++
>>>> 1 file changed, 6 insertions(+)
>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>>> index 03be0d1330a6..a0005d240db5 100644
>>>> --- a/kernel/sched/fair.c
>>>> +++ b/kernel/sched/fair.c
>>>> @@ -5523,6 +5523,9 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
>>>> hrtimer_active(&rq_of(cfs_rq)->hrtick_timer))
>>>> return;
>>>> #endif
>>>> +
>>>> + if (!entity_eligible(cfs_rq, curr))
>>>> + resched_curr(rq_of(cfs_rq));
>>>> }
>>>> @@ -8325,6 +8328,9 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>>> if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
>>>> return;
>>>> + if (!entity_eligible(cfs_rq, se))
>>>> + goto preempt;
>>>> +
>>>> find_matching_se(&se, &pse);
>>>> WARN_ON_ONCE(!pse);
>>>>
>>> Hi Chunxin,
>>>
>>> Did you run a comparative test to see which modification is more helpful on improve the latency? Modification at tick point makes more sense to me. But, seems just resched arbitrarily in wakeup might introduce too much preemption (and maybe more context switch?) in complex environment such as cgroup hierarchy.
>>>
>>> Thanks,
>>> Honglei
>> Hi Honglei
>> I attempted to build a slightly more complex scenario. It consists of 4 isolated cores,
>> 4 groups of hackbench (160 processes in total) to stress the CPU, and 1 cyclictest
>> process to test scheduling latency. Using cgroup v2, to created 64 cgroup leaf nodes
>> in a binary tree structure (with a depth of 7). I then evenly distributed the aforementioned
>> 161 processes across the 64 cgroups respectively, and observed the scheduling delay
>> performance of cyclictest.
>> Unfortunately, the test results were very fluctuating, and the two sets of data were very
>> close to each other. I suspect that it might be due to too few processes being distributed
>> in each cgroup, which led to the logic for determining ineligible always succeeding and
>> following the original logic. Later, I will attempt more tests to verify the impact of these
>> modifications in scenarios involving multiple cgroups.
>
> Sorry to lately replay, I was a bit busy last week. How's the test going on? What about run some workload processes who spend more time in kernel? Maybe it's worth do give a try, but it depends on your test plan.
>

Hi honglei

Recently, I conducted testing of multiple cgroups using version 2. Version 2 ensures the
RUN_TO_PARITY feature, so the test results are somewhat better under the
NO_RUN_TO_PARITY feature.
https://lore.kernel.org/lkml/[email protected]/T/

The testing environment I used still employed 4 cores, 4 groups of hackbench (160 processes)
and 1 cyclictest. If too many cgroups or processes are created on the 4 cores, the test
results will fluctuate severely, making it difficult to discern any differences.

The organization of cgroups was in two forms:
1. Within the same level cgroup, 10 sub-cgroups were created, with each cgroup having
an average of 16 processes.

EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY

LNICE(-19) # Avg Latencies: 00572 00347 00502 00218

LNICE(0) # Avg Latencies: 02262 02225 02442 02321

LNICE(19) # Avg Latencies: 03132 03422 03333 03489

2. In the form of a binary tree, 8 leaf cgroups were established, with a depth of 4.
On average, each cgroup had 20 processes

EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY

LNICE(-19) # Avg Latencies: 00601 00592 00510 00400

LNICE(0) # Avg Latencies: 02703 02170 02381 02126

LNICE(19) # Avg Latencies: 04773 03387 04478 03611

Based on the test results, there is a noticeable improvement in scheduling latency after
applying the patch in scenarios involving multiple cgroups.


thanks
Chunxin

> Thanks,
> Honglei
>
>> thanks
>> Chunxin



2024-06-07 02:38:53

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
>
>
> > On Jun 6, 2024, at 01:19, Chen Yu <[email protected]> wrote:
> >
> >
> > Sorry for the late reply and thanks for help clarify this. Yes, this is
> > what my previous concern was:
> > 1. It does not consider the cgroup and does not check preemption in the same
> > level which is covered by find_matching_se().
> > 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> > later pick_eevdf() will check the eligible of current anyway. But
> > as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> > I just wonder if we could leverage the cfs_rq->next to store the next
> > candidate, so it can be picked directly in the 2nd pick as a fast path?
> > Something like below untested:
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..f716646d595e 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> > static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> > {
> > struct task_struct *curr = rq->curr;
> > - struct sched_entity *se = &curr->se, *pse = &p->se;
> > + struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> > struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> > int cse_is_idle, pse_is_idle;
> >
> > @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> > /*
> > * XXX pick_eevdf(cfs_rq) != se ?
> > */
> > - if (pick_eevdf(cfs_rq) == pse)
> > + next = pick_eevdf(cfs_rq);
> > + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> > + set_next_buddy(next);
> > +
> > + if (next == pse)
> > goto preempt;
> >
> > return;
> >
> >
> > thanks,
> > Chenyu
>
> Hi Chen
>
> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
> the RB-tree twice, I initially had two methods in mind.
> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
> This idea is similar to the one you proposed this time.
> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.'
> Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
> process to schedule' are two different things.

I agree, and it seems that in current eevdf implementation the former relies on the latter.

> 'check_preempt_wakeup_fair' is not just to
> check if the newly awakened process should preempt the current process; it can also serve
> as an opportunity to check whether any other processes should preempt the current one,
> thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
> the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
> then the current process will still not be preempted.

I thought Mike has proposed a patch to deal with this scenario you mentioned above:
https://lore.kernel.org/lkml/[email protected]/

And I suppose you are refering to increase the preemption chance on current rather than reducing
the invoke of pick_eevdf() in check_preempt_wakeup_fair().

> Therefore, I posted the v2 PATCH.
> The implementation of v2 PATCH might express this point more clearly.
> https://lore.kernel.org/lkml/[email protected]/T/
>

Let me take a look at it and do some tests.

> I previously implemented and tested both of these methods, and the test results showed that
> method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
> think about it, perhaps method 1 could also be viable at the same time. :)
>

Actually I found that, even without any changes, if we enabled sched feature NEXT_BUDDY, the
wakeup latency/request latency are both reduced. The following is the schbench result on a
240 CPUs system:

NO_NEXT_BUDDY
Wakeup Latencies percentiles (usec) runtime 100 (s) (1698990 total samples)
       50.0th: 6 (429125 samples)
       90.0th: 14 (682355 samples)
      * 99.0th: 29 (126695 samples)
       99.9th: 529 (14603 samples)
       min=1, max=4741
Request Latencies percentiles (usec) runtime 100 (s) (1702523 total samples)
       50.0th: 14992 (550939 samples)
       90.0th: 15376 (668687 samples)
      * 99.0th: 15600 (128111 samples)
       99.9th: 15888 (11238 samples)
       min=3528, max=31677
RPS percentiles (requests) runtime 100 (s) (101 total samples)
       20.0th: 16864 (31 samples)
      * 50.0th: 16928 (26 samples)
       90.0th: 17248 (36 samples)
       min=16615, max=20041
average rps: 17025.23

NEXT_BUDDY
Wakeup Latencies percentiles (usec) runtime 100 (s) (1653564 total samples)
       50.0th: 5 (376845 samples)
       90.0th: 12 (632075 samples)
      * 99.0th: 24 (114398 samples)
       99.9th: 105 (13737 samples)
       min=1, max=7428
Request Latencies percentiles (usec) runtime 100 (s) (1657268 total samples)
       50.0th: 14480 (524763 samples)
       90.0th: 15216 (647982 samples)
      * 99.0th: 15472 (130730 samples)
       99.9th: 15728 (13980 samples)
       min=3542, max=34805
RPS percentiles (requests) runtime 100 (s) (101 total samples)
       20.0th: 16544 (62 samples)
      * 50.0th: 16544 (0 samples)
       90.0th: 16608 (37 samples)
       min=16470, max=16648
average rps: 16572.68

So I think NEXT_BUDDY has more or less reduced the rb-tree scan.

thanks,
Chenyu

2024-06-11 12:21:51

by Honglei Wang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



On 2024/6/6 20:39, Chunxin Zang wrote:

>
> Hi honglei
>
> Recently, I conducted testing of multiple cgroups using version 2. Version 2 ensures the
> RUN_TO_PARITY feature, so the test results are somewhat better under the
> NO_RUN_TO_PARITY feature.
> https://lore.kernel.org/lkml/[email protected]/T/
>
> The testing environment I used still employed 4 cores, 4 groups of hackbench (160 processes)
> and 1 cyclictest. If too many cgroups or processes are created on the 4 cores, the test
> results will fluctuate severely, making it difficult to discern any differences.
>
> The organization of cgroups was in two forms:
> 1. Within the same level cgroup, 10 sub-cgroups were created, with each cgroup having
> an average of 16 processes.
>
> EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY
>
> LNICE(-19) # Avg Latencies: 00572 00347 00502 00218
>
> LNICE(0) # Avg Latencies: 02262 02225 02442 02321
>
> LNICE(19) # Avg Latencies: 03132 03422 03333 03489
>
> 2. In the form of a binary tree, 8 leaf cgroups were established, with a depth of 4.
> On average, each cgroup had 20 processes
>
> EEVDF PATCH EEVDF-NO_PARITY PATCH-NO_PARITY
>
> LNICE(-19) # Avg Latencies: 00601 00592 00510 00400
>
> LNICE(0) # Avg Latencies: 02703 02170 02381 02126
>
> LNICE(19) # Avg Latencies: 04773 03387 04478 03611
>
> Based on the test results, there is a noticeable improvement in scheduling latency after
> applying the patch in scenarios involving multiple cgroups.
>
>
> thanks
> Chunxin
>
Hi Chunxin,

Thanks for sharing the test result. It looks helpful at least in this
cgroups scenario. I'm still curious which point of the two changes helps
more in your test, just as mentioned at the very first mail of this thread.

Thanks,
Honglei


2024-06-11 13:20:36

by Chunxin Zang

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible



> On Jun 7, 2024, at 10:38, Chen Yu <[email protected]> wrote:
>
> On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
>>
>>
>>> On Jun 6, 2024, at 01:19, Chen Yu <[email protected]> wrote:
>>>
>>>
>>> Sorry for the late reply and thanks for help clarify this. Yes, this is
>>> what my previous concern was:
>>> 1. It does not consider the cgroup and does not check preemption in the same
>>> level which is covered by find_matching_se().
>>> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
>>> later pick_eevdf() will check the eligible of current anyway. But
>>> as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
>>> I just wonder if we could leverage the cfs_rq->next to store the next
>>> candidate, so it can be picked directly in the 2nd pick as a fast path?
>>> Something like below untested:
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 8a5b1ae0aa55..f716646d595e 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
>>> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
>>> {
>>> struct task_struct *curr = rq->curr;
>>> - struct sched_entity *se = &curr->se, *pse = &p->se;
>>> + struct sched_entity *se = &curr->se, *pse = &p->se, *next;
>>> struct cfs_rq *cfs_rq = task_cfs_rq(curr);
>>> int cse_is_idle, pse_is_idle;
>>>
>>> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>>> /*
>>> * XXX pick_eevdf(cfs_rq) != se ?
>>> */
>>> - if (pick_eevdf(cfs_rq) == pse)
>>> + next = pick_eevdf(cfs_rq);
>>> + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
>>> + set_next_buddy(next);
>>> +
>>> + if (next == pse)
>>> goto preempt;
>>>
>>> return;
>>>
>>>
>>> thanks,
>>> Chenyu
>>
>> Hi Chen
>>
>> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
>> the RB-tree twice, I initially had two methods in mind.
>> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
>> This idea is similar to the one you proposed this time.
>> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.'
>> Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
>> process to schedule' are two different things.
>
> I agree, and it seems that in current eevdf implementation the former relies on the latter.
>
>> 'check_preempt_wakeup_fair' is not just to
>> check if the newly awakened process should preempt the current process; it can also serve
>> as an opportunity to check whether any other processes should preempt the current one,
>> thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
>> the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
>> then the current process will still not be preempted.
>
> I thought Mike has proposed a patch to deal with this scenario you mentioned above:
> https://lore.kernel.org/lkml/[email protected]/
>
> And I suppose you are refering to increase the preemption chance on current rather than reducing
> the invoke of pick_eevdf() in check_preempt_wakeup_fair().

Hi chen

Happy holidays. I believe the modifications here will indeed provide more opportunities for preemption,
thereby leading to lower scheduling latencies, while also truly reducing calls to pick_eevdf. It's a win-win situation. :)

I conducted a test. It involved applying my modifications on top of MIKE PATCH, along with
adding some statistical counts following your previous method, in order to assess the potential
benefits of my changes.


diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 03be0d1330a6..c5453866899f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8283,6 +8286,10 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
struct sched_entity *se = &curr->se, *pse = &p->se;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int cse_is_idle, pse_is_idle;
+ bool patch_preempt = false;
+ bool pick_preempt = false;
+
+ schedstat_inc(rq->check_preempt_count);

if (unlikely(se == pse))
return;
@@ -8343,15 +8350,31 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
cfs_rq = cfs_rq_of(se);
update_curr(cfs_rq);

+ if ((sched_feat(RUN_TO_PARITY) && se->vlag != se->deadline && !entity_eligible(cfs_rq, se))
+ || (!sched_feat(RUN_TO_PARITY) && !entity_eligible(cfs_rq, se))) {
+ schedstat_inc(rq->patch_preempt_count);
+ patch_preempt = true;
+ }
+
/*
* XXX pick_eevdf(cfs_rq) != se ?
*/
- if (pick_eevdf(cfs_rq) == pse)
+ if (pick_eevdf(cfs_rq) != se) {
+ schedstat_inc(rq->pick_preempt_count);
+ pick_preempt = true;
goto preempt;
+ }

return;

preempt:
+ if (patch_preempt && !pick_preempt)
+ schedstat_inc(rq->patch_preempt_only_count);
+ if (!patch_preempt && pick_preempt)
+ schedstat_inc(rq->pick_preempt_only_count);
+
+ schedstat_inc(rq->need_preempt_count);
+
resched_curr(rq);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index d2242679239e..002c6b0f966a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1141,6 +1141,12 @@ struct rq {
/* try_to_wake_up() stats */
unsigned int ttwu_count;
unsigned int ttwu_local;
+ unsigned int check_preempt_count;
+ unsigned int need_preempt_count;
+ unsigned int patch_preempt_count;
+ unsigned int patch_preempt_only_count;
+ unsigned int pick_preempt_count;
+ unsigned int pick_preempt_only_count;
#endif

#ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..fe5487572409 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -133,12 +133,21 @@ static int show_schedstat(struct seq_file *seq, void *v)

/* runqueue-specific stats */
seq_printf(seq,
- "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+ "cpu%d %u 0 %u %u %u %u %llu %llu %lu *** %u %u * %u %u * %u %u",
cpu, rq->yld_count,
rq->sched_count, rq->sched_goidle,
rq->ttwu_count, rq->ttwu_local,
rq->rq_cpu_time,
- rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+ rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+ rq->check_preempt_count,
+ rq->need_preempt_count,
+ rq->patch_preempt_count,
+ rq->patch_preempt_only_count,
+ rq->pick_preempt_count,
+ rq->pick_preempt_only_count);
+

seq_printf(seq, "\n");

The test results are as follows:

RUN_TO_PARITY:
EEVDF PATCH
.stat.check_preempt_count 5053054 5029546
.stat.need_preempt_count 0570520 1282780
.stat.patch_preempt_count ------- 0038602
.stat.patch_preempt_only_count ------- 0000000
.stat.pick_preempt_count ------- 1282780
.stat.pick_preempt_only_count ------- 1244178

NO_RUN_TO_PARITY:
EEVDF PATCH
.stat.check_preempt_count 5018589 5005812
.stat.need_preempt_count 3380513 2994773
.stat.patch_preempt_count ------- 0907927
.stat.patch_preempt_only_count ------- 0000000
.stat.pick_preempt_count ------- 2994773
.stat.pick_preempt_only_count ------- 2086846

Looking at the results, adding an ineligible check for the se within check_preempt_wakeup_fair
can prevent 3% of pick_eevdf calls under the RUN_TO_PARITY feature, and in the case of
NO_RUN_TO_PARITY, it can prevent 30% of pick_eevdf calls. It was also discovered that the
patch_preempt_only_count is at 0, indicating that all invalid checks for the se are correct.

It's worth mentioning that under the RUN_TO_PARITY feature, the number of preemptions
triggered by 'pick_eevdf != se' would be 2.25 times that of the original version, which could
lead to a series of other performance issues. However, logically speaking, this is indeed reasonable. :(


>
>> Therefore, I posted the v2 PATCH.
>> The implementation of v2 PATCH might express this point more clearly.
>> https://lore.kernel.org/lkml/[email protected]/T/
>>
>
> Let me take a look at it and do some tests.

Thank you for doing this :)

>
>> I previously implemented and tested both of these methods, and the test results showed that
>> method 2 had somewhat more obvious benefits. Therefore, I submitted method 2. Now that I
>> think about it, perhaps method 1 could also be viable at the same time. :)
>>
>
> Actually I found that, even without any changes, if we enabled sched feature NEXT_BUDDY, the
> wakeup latency/request latency are both reduced. The following is the schbench result on a
> 240 CPUs system:
>
> NO_NEXT_BUDDY
> Wakeup Latencies percentiles (usec) runtime 100 (s) (1698990 total samples)
>        50.0th: 6 (429125 samples)
>        90.0th: 14 (682355 samples)
>       * 99.0th: 29 (126695 samples)
>        99.9th: 529 (14603 samples)
>        min=1, max=4741
> Request Latencies percentiles (usec) runtime 100 (s) (1702523 total samples)
>        50.0th: 14992 (550939 samples)
>        90.0th: 15376 (668687 samples)
>       * 99.0th: 15600 (128111 samples)
>        99.9th: 15888 (11238 samples)
>        min=3528, max=31677
> RPS percentiles (requests) runtime 100 (s) (101 total samples)
>        20.0th: 16864 (31 samples)
>       * 50.0th: 16928 (26 samples)
>        90.0th: 17248 (36 samples)
>        min=16615, max=20041
> average rps: 17025.23
>
> NEXT_BUDDY
> Wakeup Latencies percentiles (usec) runtime 100 (s) (1653564 total samples)
>        50.0th: 5 (376845 samples)
>        90.0th: 12 (632075 samples)
>       * 99.0th: 24 (114398 samples)
>        99.9th: 105 (13737 samples)
>        min=1, max=7428
> Request Latencies percentiles (usec) runtime 100 (s) (1657268 total samples)
>        50.0th: 14480 (524763 samples)
>        90.0th: 15216 (647982 samples)
>       * 99.0th: 15472 (130730 samples)
>        99.9th: 15728 (13980 samples)
>        min=3542, max=34805
> RPS percentiles (requests) runtime 100 (s) (101 total samples)
>        20.0th: 16544 (62 samples)
>       * 50.0th: 16544 (0 samples)
>        90.0th: 16608 (37 samples)
>        min=16470, max=16648
> average rps: 16572.68
>
> So I think NEXT_BUDDY has more or less reduced the rb-tree scan.
>
> thanks,
> Chenyu

I'm not completely sure if my understanding is correct, but NEXT_BUDDY can only cache the process
that has been woken up; it doesn't necessarily correspond to the result returned by pick_eevdf. Furthermore,
even if it does cache the result returned by pick_eevdf, by the time the next scheduling occurs, due to
other processes enqueing or dequeuing, it might not be the result picked by pick_eevdf at that moment.
Hence, it's a 'best effort' approach, and therefore, its impact on scheduling latency may vary depending
on the use case.

thanks
Chunxin


2024-06-13 11:47:32

by Chen Yu

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Reschedule the cfs_rq when current is ineligible

On 2024-06-11 at 21:10:50 +0800, Chunxin Zang wrote:
>
>
> > On Jun 7, 2024, at 10:38, Chen Yu <[email protected]> wrote:
> >
> > On 2024-06-06 at 09:46:53 +0800, Chunxin Zang wrote:
> >>
> >>
> >>> On Jun 6, 2024, at 01:19, Chen Yu <[email protected]> wrote:
> >>>
> >>>
> >>> Sorry for the late reply and thanks for help clarify this. Yes, this is
> >>> what my previous concern was:
> >>> 1. It does not consider the cgroup and does not check preemption in the same
> >>> level which is covered by find_matching_se().
> >>> 2. The if (!entity_eligible(cfs_rq, se)) for current is redundant because
> >>> later pick_eevdf() will check the eligible of current anyway. But
> >>> as pointed out by Chunxi, his concern is the double-traverse of the rb-tree,
> >>> I just wonder if we could leverage the cfs_rq->next to store the next
> >>> candidate, so it can be picked directly in the 2nd pick as a fast path?
> >>> Something like below untested:
> >>>
> >>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >>> index 8a5b1ae0aa55..f716646d595e 100644
> >>> --- a/kernel/sched/fair.c
> >>> +++ b/kernel/sched/fair.c
> >>> @@ -8349,7 +8349,7 @@ static void set_next_buddy(struct sched_entity *se)
> >>> static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
> >>> {
> >>> struct task_struct *curr = rq->curr;
> >>> - struct sched_entity *se = &curr->se, *pse = &p->se;
> >>> + struct sched_entity *se = &curr->se, *pse = &p->se, *next;
> >>> struct cfs_rq *cfs_rq = task_cfs_rq(curr);
> >>> int cse_is_idle, pse_is_idle;
> >>>
> >>> @@ -8415,7 +8415,11 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
> >>> /*
> >>> * XXX pick_eevdf(cfs_rq) != se ?
> >>> */
> >>> - if (pick_eevdf(cfs_rq) == pse)
> >>> + next = pick_eevdf(cfs_rq);
> >>> + if (sched_feat(NEXT_BUDDY) && !(wake_flags & WF_FORK) && next)
> >>> + set_next_buddy(next);
> >>> +
> >>> + if (next == pse)
> >>> goto preempt;
> >>>
> >>> return;
> >>>
> >>>
> >>> thanks,
> >>> Chenyu
> >>
> >> Hi Chen
> >>
> >> First of all, thank you for your patient response. Regarding the issue of avoiding traversing
> >> the RB-tree twice, I initially had two methods in mind.
> >> 1. Cache the optimal result so that it can be used directly during the second pick_eevdf operation.
> >> This idea is similar to the one you proposed this time.
> >> 2. Avoid the pick_eevdf operation as much as possible within 'check_preempt_wakeup_fair.'
> >> Because I believe that 'checking whether preemption is necessary' and 'finding the optimal
> >> process to schedule' are two different things.
> >
> > I agree, and it seems that in current eevdf implementation the former relies on the latter.
> >
> >> 'check_preempt_wakeup_fair' is not just to
> >> check if the newly awakened process should preempt the current process; it can also serve
> >> as an opportunity to check whether any other processes should preempt the current one,
> >> thereby improving the real-time performance of the scheduler. Although now in pick_eevdf,
> >> the legitimacy of 'curr' is also evaluated, if the result returned is not the awakened process,
> >> then the current process will still not be preempted.
> >
> > I thought Mike has proposed a patch to deal with this scenario you mentioned above:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > And I suppose you are refering to increase the preemption chance on current rather than reducing
> > the invoke of pick_eevdf() in check_preempt_wakeup_fair().
>
> Hi chen
>
> Happy holidays. I believe the modifications here will indeed provide more opportunities for preemption,
> thereby leading to lower scheduling latencies, while also truly reducing calls to pick_eevdf. It's a win-win situation. :)
>
> I conducted a test. It involved applying my modifications on top of MIKE PATCH, along with
> adding some statistical counts following your previous method, in order to assess the potential
> benefits of my changes.
>

[snip]

> Looking at the results, adding an ineligible check for the se within check_preempt_wakeup_fair
> can prevent 3% of pick_eevdf calls under the RUN_TO_PARITY feature, and in the case of
> NO_RUN_TO_PARITY, it can prevent 30% of pick_eevdf calls. It was also discovered that the
> patch_preempt_only_count is at 0, indicating that all invalid checks for the se are correct.
>
> It's worth mentioning that under the RUN_TO_PARITY feature, the number of preemptions
> triggered by 'pick_eevdf != se' would be 2.25 times that of the original version, which could
> lead to a series of other performance issues. However, logically speaking, this is indeed reasonable. :(
>
>

I wonder if we can only do this for NO_RUN_TO_PARITY? That is to say, if RUN_TO_PARITY is enabled,
we do not preempt the current task based on its eligibility in check_preempt_wakeup_fair()
or entity_tick(). Personally I don't have objection to increase the preemption a little bit, however
it seems that we have encountered over-scheduling and that is why RUN_TO_PARITY was introduced,
and RUN_TO_PARITY means "respect the slice" per my understanding.

> > So I think NEXT_BUDDY has more or less reduced the rb-tree scan.
> >
> > thanks,
> > Chenyu
>
> I'm not completely sure if my understanding is correct, but NEXT_BUDDY can only cache the process
> that has been woken up; it doesn't necessarily correspond to the result returned by pick_eevdf. Furthermore,
> even if it does cache the result returned by pick_eevdf, by the time the next scheduling occurs, due to
> other processes enqueing or dequeuing, it might not be the result picked by pick_eevdf at that moment.
> Hence, it's a 'best effort' approach, and therefore, its impact on scheduling latency may vary depending
> on the use case.
>

That is true, currently the NEXT_BUDDY is set to the wakee if it is eligible, not mean it is the best
candidate in the tree. I think it is 'best effort' to reduce the wakeup latency rather than fairness.

thanks,
Chenyu