2020-08-20 12:02:33

by Jiang Biao

[permalink] [raw]
Subject: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

From: Jiang Biao <[email protected]>

Vruntime compensation has been down in place_entity() to
boot the waking procedure for fair tasks. There is no need to
do that for SCHED_IDLE task actually.

Not compensating vruntime for SCHED_IDLE task could make
SCHED_IDLE task more harmless for normal tasks.

Signed-off-by: Jiang Biao <[email protected]>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1a68a0536add..adff77676a0a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
vruntime += sched_vslice(cfs_rq, se);

/* sleeps up to a single latency don't count. */
- if (!initial) {
+ if (!initial && likely(!task_has_idle_policy(task_of(se)))) {
unsigned long thresh = sysctl_sched_latency;

/*
--
2.21.0


2020-08-20 12:52:48

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, 20 Aug 2020 at 14:00, Jiang Biao <[email protected]> wrote:
>
> From: Jiang Biao <[email protected]>
>
> Vruntime compensation has been down in place_entity() to
> boot the waking procedure for fair tasks. There is no need to

s/boot/boost/ ?

> do that for SCHED_IDLE task actually.
>
> Not compensating vruntime for SCHED_IDLE task could make
> SCHED_IDLE task more harmless for normal tasks.
>
> Signed-off-by: Jiang Biao <[email protected]>
> ---
> kernel/sched/fair.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 1a68a0536add..adff77676a0a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> vruntime += sched_vslice(cfs_rq, se);
>
> /* sleeps up to a single latency don't count. */
> - if (!initial) {
> + if (!initial && likely(!task_has_idle_policy(task_of(se)))) {

What if se is not a task ?

> unsigned long thresh = sysctl_sched_latency;
>
> /*
> --
> 2.21.0
>

2020-08-20 13:00:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, Aug 20, 2020 at 02:51:06PM +0200, Vincent Guittot wrote:
> On Thu, 20 Aug 2020 at 14:00, Jiang Biao <[email protected]> wrote:
> >
> > From: Jiang Biao <[email protected]>
> >
> > Vruntime compensation has been down in place_entity() to
> > boot the waking procedure for fair tasks. There is no need to
>
> s/boot/boost/ ?
>
> > do that for SCHED_IDLE task actually.
> >
> > Not compensating vruntime for SCHED_IDLE task could make
> > SCHED_IDLE task more harmless for normal tasks.

This is rather week. It would be much better if there's some actual data
to support this claim.

> > Signed-off-by: Jiang Biao <[email protected]>
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1a68a0536add..adff77676a0a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > vruntime += sched_vslice(cfs_rq, se);
> >
> > /* sleeps up to a single latency don't count. */
> > - if (!initial) {
> > + if (!initial && likely(!task_has_idle_policy(task_of(se)))) {
>
> What if se is not a task ?

Then we very much need it, because it might have fair tasks inside. I
suppose you could do something complicated with idle_h_nr_running, but
is all that really worth the effort?

> > unsigned long thresh = sysctl_sched_latency;
> >
> > /*
> > --
> > 2.21.0
> >

2020-08-20 13:17:02

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, 20 Aug 2020 at 14:58, <[email protected]> wrote:
>
> On Thu, Aug 20, 2020 at 02:51:06PM +0200, Vincent Guittot wrote:
> > On Thu, 20 Aug 2020 at 14:00, Jiang Biao <[email protected]> wrote:
> > >
> > > From: Jiang Biao <[email protected]>
> > >
> > > Vruntime compensation has been down in place_entity() to
> > > boot the waking procedure for fair tasks. There is no need to
> >
> > s/boot/boost/ ?
> >
> > > do that for SCHED_IDLE task actually.
> > >
> > > Not compensating vruntime for SCHED_IDLE task could make
> > > SCHED_IDLE task more harmless for normal tasks.
>
> This is rather week. It would be much better if there's some actual data
> to support this claim.
>
> > > Signed-off-by: Jiang Biao <[email protected]>
> > > ---
> > > kernel/sched/fair.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 1a68a0536add..adff77676a0a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > > vruntime += sched_vslice(cfs_rq, se);
> > >
> > > /* sleeps up to a single latency don't count. */
> > > - if (!initial) {
> > > + if (!initial && likely(!task_has_idle_policy(task_of(se)))) {
> >
> > What if se is not a task ?
>
> Then we very much need it, because it might have fair tasks inside. I
> suppose you could do something complicated with idle_h_nr_running, but
> is all that really worth the effort?

Not sure that Jiang is using cgroups otherwise he would have seen a
warning I think.
That's been said, not compensating the vruntime for a sched_idle task
makes sense for me. Even if that will only help for others task in the
same cfs_rq

>
> > > unsigned long thresh = sysctl_sched_latency;
> > >
> > > /*
> > > --
> > > 2.21.0
> > >

2020-08-20 13:48:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

> That's been said, not compensating the vruntime for a sched_idle task
> makes sense for me. Even if that will only help for others task in the
> same cfs_rq

Yeah, but it is worth the extra pointer chasing and branches?

Then again, I suppose we started all that with the idle_h_nr_running
nonsense :/

2020-08-20 14:13:26

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
>
> > That's been said, not compensating the vruntime for a sched_idle task
> > makes sense for me. Even if that will only help for others task in the
> > same cfs_rq
>
> Yeah, but it is worth the extra pointer chasing and branches?

For that I let Jiang provides figures to show the worthful

>
> Then again, I suppose we started all that with the idle_h_nr_running
> nonsense :/

2020-08-20 14:17:16

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, 20 Aug 2020 at 20:51, Vincent Guittot
<[email protected]> wrote:
>
> On Thu, 20 Aug 2020 at 14:00, Jiang Biao <[email protected]> wrote:
> >
> > From: Jiang Biao <[email protected]>
> >
> > Vruntime compensation has been down in place_entity() to
> > boot the waking procedure for fair tasks. There is no need to
>
> s/boot/boost/ ?
Yes. -:)
>
> > do that for SCHED_IDLE task actually.
> >
> > Not compensating vruntime for SCHED_IDLE task could make
> > SCHED_IDLE task more harmless for normal tasks.
> >
> > Signed-off-by: Jiang Biao <[email protected]>
> > ---
> > kernel/sched/fair.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 1a68a0536add..adff77676a0a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > vruntime += sched_vslice(cfs_rq, se);
> >
> > /* sleeps up to a single latency don't count. */
> > - if (!initial) {
> > + if (!initial && likely(!task_has_idle_policy(task_of(se)))) {
>
> What if se is not a task ?
Is it ok to limit the case to task se, like?
+ if (!initial && likely(!entity_is_task(se) ||
!task_has_idle_policy(task_of(se)))) {

Thx.
Regards,
Jiang
>
> > unsigned long thresh = sysctl_sched_latency;
> >
> > /*
> > --
> > 2.21.0
> >

2020-08-20 14:23:37

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, 20 Aug 2020 at 20:58, <[email protected]> wrote:
>
> On Thu, Aug 20, 2020 at 02:51:06PM +0200, Vincent Guittot wrote:
> > On Thu, 20 Aug 2020 at 14:00, Jiang Biao <[email protected]> wrote:
> > >
> > > From: Jiang Biao <[email protected]>
> > >
> > > Vruntime compensation has been down in place_entity() to
> > > boot the waking procedure for fair tasks. There is no need to
> >
> > s/boot/boost/ ?
> >
> > > do that for SCHED_IDLE task actually.
> > >
> > > Not compensating vruntime for SCHED_IDLE task could make
> > > SCHED_IDLE task more harmless for normal tasks.
>
> This is rather week. It would be much better if there's some actual data
> to support this claim.
I'll try to have a test and get some data. :)

>
> > > Signed-off-by: Jiang Biao <[email protected]>
> > > ---
> > > kernel/sched/fair.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > > index 1a68a0536add..adff77676a0a 100644
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -4115,7 +4115,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int initial)
> > > vruntime += sched_vslice(cfs_rq, se);
> > >
> > > /* sleeps up to a single latency don't count. */
> > > - if (!initial) {
> > > + if (!initial && likely(!task_has_idle_policy(task_of(se)))) {
> >
> > What if se is not a task ?
>
> Then we very much need it, because it might have fair tasks inside. I
> suppose you could do something complicated with idle_h_nr_running, but
> is all that really worth the effort?
Would it be better to limit to task se case? like
+ if (!initial && likely(!entity_is_task(se) ||
!task_has_idle_policy(task_of(se)))) {



>
> > > unsigned long thresh = sysctl_sched_latency;
> > >
> > > /*
> > > --
> > > 2.21.0
> > >

2020-08-20 14:25:22

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
<[email protected]> wrote:
>
> On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
> >
> > > That's been said, not compensating the vruntime for a sched_idle task
> > > makes sense for me. Even if that will only help for others task in the
> > > same cfs_rq
> >
> > Yeah, but it is worth the extra pointer chasing and branches?
>
> For that I let Jiang provides figures to show the worthful
I'll try to have some tests and show the wakeup latency difference
after applying the patch later.

Thanks for your time.
Regards
Jiang
>
> >
> > Then again, I suppose we started all that with the idle_h_nr_running
> > nonsense :/

2020-08-21 00:43:12

by kernel test robot

[permalink] [raw]
Subject: [sched/fair] 88d13f778f: WARNING:at_kernel/sched/fair.c:#place_entity

Greeting,

FYI, we noticed the following commit (built with gcc-9):

commit: 88d13f778f3a28132b50a0d698a559de695e8a66 ("[PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task")
url: https://github.com/0day-ci/linux/commits/Jiang-Biao/sched-fair-avoid-vruntime-compensation-for-SCHED_IDLE-task/20200820-200323
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 5f4a1c4ea44728aa80be21dbf3a0469b5ca81d88

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 8G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+----------------------------------------------+------------+------------+
| | 5f4a1c4ea4 | 88d13f778f |
+----------------------------------------------+------------+------------+
| boot_successes | 9 | 0 |
| boot_failures | 0 | 4 |
| WARNING:at_kernel/sched/fair.c:#place_entity | 0 | 4 |
| RIP:place_entity | 0 | 4 |
+----------------------------------------------+------------+------------+


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 9.183115] WARNING: CPU: 0 PID: 1 at kernel/sched/fair.c:263 place_entity+0x85/0xa0
[ 9.185749] Modules linked in: ip_tables
[ 9.186716] CPU: 0 PID: 1 Comm: systemd Not tainted 5.9.0-rc1-00115-g88d13f778f3a28 #1
[ 9.188770] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 9.190637] RIP: 0010:place_entity+0x85/0xa0
[ 9.191717] Code: 48 81 3b 00 00 10 00 75 25 48 01 c5 eb cd 80 3d d8 1b 70 01 00 75 aa 48 c7 c7 06 ba 33 aa c6 05 c8 1b 70 01 01 e8 9d c6 fb ff <0f> 0b eb 93 48 89 da be 00 00 10 00 48 89 c7 e8 47 d3 ff ff eb c9
[ 9.195587] RSP: 0018:ffffa7b180013dd0 EFLAGS: 00010086
[ 9.196787] RAX: 0000000000000000 RBX: ffff9ae9d74f0600 RCX: 0000000000000000
[ 9.198252] RDX: 0000000000000003 RSI: ffffffffaafbecb3 RDI: ffffffffaafbc86c
[ 9.199896] RBP: 00000000e79bcfc3 R08: 00000002235af97b R09: 0000000000000013
[ 9.201365] R10: 000000000000072d R11: 0000000000000000 R12: 0000000000000000
[ 9.202843] R13: 0000000000000001 R14: ffff9aea3fc2b300 R15: 0000000000000000
[ 9.204463] FS: 00007f8c948f3940(0000) GS:ffff9aea3fc00000(0000) knlGS:0000000000000000
[ 9.206319] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.207669] CR2: 0000558ff0cb9b98 CR3: 000000012f9f0000 CR4: 00000000000006f0
[ 9.209192] Call Trace:
[ 9.210687] enqueue_entity+0x218/0x3a0
[ 9.211809] enqueue_task_fair+0x98/0x720
[ 9.212759] sched_move_task+0x184/0x1a0
[ 9.213694] autogroup_move_group+0x97/0x180
[ 9.214682] sched_autogroup_create_attach+0xca/0x1c0
[ 9.215908] ksys_setsid+0xe9/0x120
[ 9.216764] __do_sys_setsid+0xa/0x20
[ 9.217650] do_syscall_64+0x33/0x40
[ 9.218518] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 9.219753] RIP: 0033:0x7f8c95a81797
[ 9.220630] Code: 73 01 c3 48 8b 0d f9 36 0f 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 70 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 36 0f 00 f7 d8 64 89 01 48
[ 9.224523] RSP: 002b:00007ffe1ba622a8 EFLAGS: 00000202 ORIG_RAX: 0000000000000070
[ 9.226255] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f8c95a81797
[ 9.227834] RDX: 00000000ffffffff RSI: 00007ffe1ba62260 RDI: 0000000000000000
[ 9.229301] RBP: 0000000000000005 R08: 0000000000000020 R09: 0000000000000000
[ 9.230772] R10: 0000000000000008 R11: 0000000000000202 R12: 00007ffe1ba625d8
[ 9.232371] R13: 0000558ff0cae6d0 R14: 0000000000000000 R15: 0000000000000001
[ 9.233901] ---[ end trace c7ae918a59a4910d ]---


To reproduce:

# build kernel
cd linux
cp config-5.9.0-rc1-00115-g88d13f778f3a28 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
lkp


Attachments:
(No filename) (4.18 kB)
config-5.9.0-rc1-00115-g88d13f778f3a28 (161.51 kB)
job-script (4.95 kB)
dmesg.xz (14.73 kB)
Download all attachments

2020-08-23 08:07:13

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

Hi, Vincent and Peter

On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
<[email protected]> wrote:
>
> On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
> >
> > > That's been said, not compensating the vruntime for a sched_idle task
> > > makes sense for me. Even if that will only help for others task in the
> > > same cfs_rq
> >
> > Yeah, but it is worth the extra pointer chasing and branches?
>
> For that I let Jiang provides figures to show the worthful
Using the following configuration for rt-app,
{
"tasks" : {
"task_other" : {
"instance" : 1, //only 1 instance to be easy to observe
"cpus" : [2],
"loop" : 2000,
"policy" : "SCHED_OTHER",
"run" : -1, //make normal task 100% running
"priority" : 0,
"sleep" : 0
},
"task_idle" : {
"instance" : 1,
"cpus" : [2],
"loop" : 2000,
"policy" : "SCHED_IDLE",
"run" : 1, //only run 1us to avoid
blocking(always waiting for running), making check_preempt_wakeup
work(S->R switching)
"timer" : { "ref" : "unique2" , "period" :
16000, "mode" : "absolute" }
}
},
"global" : {
"calibration" : "CPU0",
"default_policy" : "SCHED_OTHER",
"duration" : -1
}
}
without the patch,
<...>-39771 [002] d.h. 42478.177771: sched_wakeup:
comm=task_idle-1 pid=39772 prio=120 target_cpu=002
<...>-39771 [002] d... 42478.190437: sched_switch:
prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=39772 next_prio=120
<...>-39771 [002] d.h. 42478.193771: sched_wakeup:
comm=task_idle-1 pid=39772 prio=120 target_cpu=002
<...>-39771 [002] d... 42478.206438: sched_switch:
prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=39772 next_prio=120
<...>-39771 [002] d.h. 42478.209771: sched_wakeup:
comm=task_idle-1 pid=39772 prio=120 target_cpu=002
<...>-39771 [002] d... 42478.222438: sched_switch:
prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=39772 next_prio=120
<...>-39771 [002] d.h. 42478.225771: sched_wakeup:
comm=task_idle-1 pid=39772 prio=120 target_cpu=002
<...>-39771 [002] d... 42478.238438: sched_switch:
prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=39772 next_prio=120
*task_idle* preempts every 12ms because of the compensation.

with the patch,
task_other-0-27670 [002] d.h. 136785.278059: sched_wakeup:
comm=task_idle-1 pid=27671 prio=120 target_cpu=002
task_other-0-27670 [002] d... 136785.293623: sched_switch:
prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=27671 next_prio=120
task_other-0-27670 [002] d.h. 136785.294059: sched_wakeup:
comm=task_idle-1 pid=27671 prio=120 target_cpu=002
task_other-0-27670 [002] d... 136785.317624: sched_switch:
prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=27671 next_prio=120
task_other-0-27670 [002] d.h. 136785.326059: sched_wakeup:
comm=task_idle-1 pid=27671 prio=120 target_cpu=002
task_other-0-27670 [002] d... 136785.341622: sched_switch:
prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=27671 next_prio=120
task_other-0-27670 [002] d.h. 136785.342059: sched_wakeup:
comm=task_idle-1 pid=27671 prio=120 target_cpu=002
task_other-0-27670 [002] d... 136785.365623: sched_switch:
prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
next_comm=task_idle-1 next_pid=27671 next_prio=120
*task_idle* preempts every 24 or 16 ms.

This patch could reduce the preempting frequency of task_idle, and
reduce the interference from SCHED_IDLE task.

Thx.
Regards,
Jiang

2020-08-28 12:57:02

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Sun, 23 Aug 2020 at 09:33, Jiang Biao <[email protected]> wrote:
>
> Hi, Vincent and Peter
>
> On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
> <[email protected]> wrote:
> >
> > On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
> > >
> > > > That's been said, not compensating the vruntime for a sched_idle task
> > > > makes sense for me. Even if that will only help for others task in the
> > > > same cfs_rq
> > >
> > > Yeah, but it is worth the extra pointer chasing and branches?
> >
> > For that I let Jiang provides figures to show the worthful
> Using the following configuration for rt-app,
> {
> "tasks" : {
> "task_other" : {
> "instance" : 1, //only 1 instance to be easy to observe
> "cpus" : [2],
> "loop" : 2000,
> "policy" : "SCHED_OTHER",
> "run" : -1, //make normal task 100% running
> "priority" : 0,
> "sleep" : 0
> },
> "task_idle" : {
> "instance" : 1,
> "cpus" : [2],
> "loop" : 2000,
> "policy" : "SCHED_IDLE",
> "run" : 1, //only run 1us to avoid
> blocking(always waiting for running), making check_preempt_wakeup
> work(S->R switching)
> "timer" : { "ref" : "unique2" , "period" :
> 16000, "mode" : "absolute" }
> }
> },
> "global" : {
> "calibration" : "CPU0",
> "default_policy" : "SCHED_OTHER",
> "duration" : -1
> }
> }
> without the patch,
> <...>-39771 [002] d.h. 42478.177771: sched_wakeup:
> comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> <...>-39771 [002] d... 42478.190437: sched_switch:
> prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=39772 next_prio=120
> <...>-39771 [002] d.h. 42478.193771: sched_wakeup:
> comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> <...>-39771 [002] d... 42478.206438: sched_switch:
> prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=39772 next_prio=120
> <...>-39771 [002] d.h. 42478.209771: sched_wakeup:
> comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> <...>-39771 [002] d... 42478.222438: sched_switch:
> prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=39772 next_prio=120
> <...>-39771 [002] d.h. 42478.225771: sched_wakeup:
> comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> <...>-39771 [002] d... 42478.238438: sched_switch:
> prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=39772 next_prio=120
> *task_idle* preempts every 12ms because of the compensation.
>
> with the patch,
> task_other-0-27670 [002] d.h. 136785.278059: sched_wakeup:
> comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> task_other-0-27670 [002] d... 136785.293623: sched_switch:
> prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=27671 next_prio=120
> task_other-0-27670 [002] d.h. 136785.294059: sched_wakeup:
> comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> task_other-0-27670 [002] d... 136785.317624: sched_switch:
> prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=27671 next_prio=120
> task_other-0-27670 [002] d.h. 136785.326059: sched_wakeup:
> comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> task_other-0-27670 [002] d... 136785.341622: sched_switch:
> prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=27671 next_prio=120
> task_other-0-27670 [002] d.h. 136785.342059: sched_wakeup:
> comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> task_other-0-27670 [002] d... 136785.365623: sched_switch:
> prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> next_comm=task_idle-1 next_pid=27671 next_prio=120
> *task_idle* preempts every 24 or 16 ms.
>
> This patch could reduce the preempting frequency of task_idle, and
> reduce the interference from SCHED_IDLE task.

For this use case, the preemption is only 1us long. Is it a realistic
problem use case ? your normal threads might be more impacted by tick,
interrupt, timer and others things than this 1us idle thread every
16ms. I mean, the patch moves the impact from 1us every 12ms (0.01%)
to 1us every 24ms (0.005%). Then, If the idle thread starts to run a
bit longer, the period before preempting the normal thread quickly
increases

What is the improvement for an idle thread trying to run 1ms every
16ms as an example ?

Regards,
Vincent
>
> Thx.
> Regards,
> Jiang

2020-09-01 10:16:30

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

Hi, Vincent

Sorry for the late reply.:)

On Fri, 28 Aug 2020 at 20:55, Vincent Guittot
<[email protected]> wrote:
>
> On Sun, 23 Aug 2020 at 09:33, Jiang Biao <[email protected]> wrote:
> >
> > Hi, Vincent and Peter
> >
> > On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
> > <[email protected]> wrote:
> > >
> > > On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
> > > >
> > > > > That's been said, not compensating the vruntime for a sched_idle task
> > > > > makes sense for me. Even if that will only help for others task in the
> > > > > same cfs_rq
> > > >
> > > > Yeah, but it is worth the extra pointer chasing and branches?
> > >
> > > For that I let Jiang provides figures to show the worthful
> > Using the following configuration for rt-app,
> > {
> > "tasks" : {
> > "task_other" : {
> > "instance" : 1, //only 1 instance to be easy to observe
> > "cpus" : [2],
> > "loop" : 2000,
> > "policy" : "SCHED_OTHER",
> > "run" : -1, //make normal task 100% running
> > "priority" : 0,
> > "sleep" : 0
> > },
> > "task_idle" : {
> > "instance" : 1,
> > "cpus" : [2],
> > "loop" : 2000,
> > "policy" : "SCHED_IDLE",
> > "run" : 1, //only run 1us to avoid
> > blocking(always waiting for running), making check_preempt_wakeup
> > work(S->R switching)
> > "timer" : { "ref" : "unique2" , "period" :
> > 16000, "mode" : "absolute" }
> > }
> > },
> > "global" : {
> > "calibration" : "CPU0",
> > "default_policy" : "SCHED_OTHER",
> > "duration" : -1
> > }
> > }
> > without the patch,
> > <...>-39771 [002] d.h. 42478.177771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.190437: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > <...>-39771 [002] d.h. 42478.193771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.206438: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > <...>-39771 [002] d.h. 42478.209771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.222438: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > <...>-39771 [002] d.h. 42478.225771: sched_wakeup:
> > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > <...>-39771 [002] d... 42478.238438: sched_switch:
> > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > *task_idle* preempts every 12ms because of the compensation.
> >
> > with the patch,
> > task_other-0-27670 [002] d.h. 136785.278059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.293623: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > task_other-0-27670 [002] d.h. 136785.294059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.317624: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > task_other-0-27670 [002] d.h. 136785.326059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.341622: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > task_other-0-27670 [002] d.h. 136785.342059: sched_wakeup:
> > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > task_other-0-27670 [002] d... 136785.365623: sched_switch:
> > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > *task_idle* preempts every 24 or 16 ms.
> >
> > This patch could reduce the preempting frequency of task_idle, and
> > reduce the interference from SCHED_IDLE task.
>
> For this use case, the preemption is only 1us long. Is it a realistic
> problem use case ? your normal threads might be more impacted by tick,
Nop.
1us is just to make the logic in place_entity() work. If the preemption is
longer, the IDLE task could not finish its work before being preempted back
by normal task, and the IDLE task would be always in RUNNING status from
then on. In that case place_entity() would never be reached because of the
RUNNING status.

> interrupt, timer and others things than this 1us idle thread every
> 16ms. I mean, the patch moves the impact from 1us every 12ms (0.01%)
> to 1us every 24ms (0.005%). Then, If the idle thread starts to run a
> bit longer, the period before preempting the normal thread quickly
> increases
Exactly.

>
> What is the improvement for an idle thread trying to run 1ms every
> 16ms as an example ?
If to run 1ms, the IDLE task would be always RUNNING status as said
above. In that case, place_entity() would not work, and the preemption
would happen every 340ms as always.

Thx.
Regards,
Jiang

2020-09-01 13:08:44

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

On Tue, 1 Sep 2020 at 12:15, Jiang Biao <[email protected]> wrote:
>
> Hi, Vincent
>
> Sorry for the late reply.:)
>
> On Fri, 28 Aug 2020 at 20:55, Vincent Guittot
> <[email protected]> wrote:
> >
> > On Sun, 23 Aug 2020 at 09:33, Jiang Biao <[email protected]> wrote:
> > >
> > > Hi, Vincent and Peter
> > >
> > > On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
> > > <[email protected]> wrote:
> > > >
> > > > On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
> > > > >
> > > > > > That's been said, not compensating the vruntime for a sched_idle task
> > > > > > makes sense for me. Even if that will only help for others task in the
> > > > > > same cfs_rq
> > > > >
> > > > > Yeah, but it is worth the extra pointer chasing and branches?
> > > >
> > > > For that I let Jiang provides figures to show the worthful
> > > Using the following configuration for rt-app,
> > > {
> > > "tasks" : {
> > > "task_other" : {
> > > "instance" : 1, //only 1 instance to be easy to observe
> > > "cpus" : [2],
> > > "loop" : 2000,
> > > "policy" : "SCHED_OTHER",
> > > "run" : -1, //make normal task 100% running
> > > "priority" : 0,
> > > "sleep" : 0
> > > },
> > > "task_idle" : {
> > > "instance" : 1,
> > > "cpus" : [2],
> > > "loop" : 2000,
> > > "policy" : "SCHED_IDLE",
> > > "run" : 1, //only run 1us to avoid
> > > blocking(always waiting for running), making check_preempt_wakeup
> > > work(S->R switching)
> > > "timer" : { "ref" : "unique2" , "period" :
> > > 16000, "mode" : "absolute" }
> > > }
> > > },
> > > "global" : {
> > > "calibration" : "CPU0",
> > > "default_policy" : "SCHED_OTHER",
> > > "duration" : -1
> > > }
> > > }
> > > without the patch,
> > > <...>-39771 [002] d.h. 42478.177771: sched_wakeup:
> > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > <...>-39771 [002] d... 42478.190437: sched_switch:
> > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > <...>-39771 [002] d.h. 42478.193771: sched_wakeup:
> > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > <...>-39771 [002] d... 42478.206438: sched_switch:
> > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > <...>-39771 [002] d.h. 42478.209771: sched_wakeup:
> > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > <...>-39771 [002] d... 42478.222438: sched_switch:
> > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > <...>-39771 [002] d.h. 42478.225771: sched_wakeup:
> > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > <...>-39771 [002] d... 42478.238438: sched_switch:
> > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > *task_idle* preempts every 12ms because of the compensation.
> > >
> > > with the patch,
> > > task_other-0-27670 [002] d.h. 136785.278059: sched_wakeup:
> > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > task_other-0-27670 [002] d... 136785.293623: sched_switch:
> > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > task_other-0-27670 [002] d.h. 136785.294059: sched_wakeup:
> > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > task_other-0-27670 [002] d... 136785.317624: sched_switch:
> > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > task_other-0-27670 [002] d.h. 136785.326059: sched_wakeup:
> > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > task_other-0-27670 [002] d... 136785.341622: sched_switch:
> > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > task_other-0-27670 [002] d.h. 136785.342059: sched_wakeup:
> > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > task_other-0-27670 [002] d... 136785.365623: sched_switch:
> > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > *task_idle* preempts every 24 or 16 ms.
> > >
> > > This patch could reduce the preempting frequency of task_idle, and
> > > reduce the interference from SCHED_IDLE task.
> >
> > For this use case, the preemption is only 1us long. Is it a realistic
> > problem use case ? your normal threads might be more impacted by tick,
> Nop.
> 1us is just to make the logic in place_entity() work. If the preemption is
> longer, the IDLE task could not finish its work before being preempted back
> by normal task, and the IDLE task would be always in RUNNING status from
> then on. In that case place_entity() would never be reached because of the
> RUNNING status.

Yeah, I agree that the setup is the right one to check the worst
wakeup pre emption period but it doesn't sound like a realistic
problem

Have you tried this with your system and does it improve anything ?
Otherwise, I agree with Peter that it doesn't worth having an
additional test in the wakeup path if it doesn't help any cases

>
> > interrupt, timer and others things than this 1us idle thread every
> > 16ms. I mean, the patch moves the impact from 1us every 12ms (0.01%)
> > to 1us every 24ms (0.005%). Then, If the idle thread starts to run a
> > bit longer, the period before preempting the normal thread quickly
> > increases
> Exactly.
>
> >
> > What is the improvement for an idle thread trying to run 1ms every
> > 16ms as an example ?
> If to run 1ms, the IDLE task would be always RUNNING status as said
> above. In that case, place_entity() would not work, and the preemption
> would happen every 340ms as always.
>
> Thx.
> Regards,
> Jiang

2020-09-01 14:25:55

by Jiang Biao

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: avoid vruntime compensation for SCHED_IDLE task

Hi,

On Tue, 1 Sep 2020 at 21:04, Vincent Guittot <[email protected]> wrote:
>
> On Tue, 1 Sep 2020 at 12:15, Jiang Biao <[email protected]> wrote:
> >
> > Hi, Vincent
> >
> > Sorry for the late reply.:)
> >
> > On Fri, 28 Aug 2020 at 20:55, Vincent Guittot
> > <[email protected]> wrote:
> > >
> > > On Sun, 23 Aug 2020 at 09:33, Jiang Biao <[email protected]> wrote:
> > > >
> > > > Hi, Vincent and Peter
> > > >
> > > > On Thu, 20 Aug 2020 at 22:09, Vincent Guittot
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Thu, 20 Aug 2020 at 15:44, <[email protected]> wrote:
> > > > > >
> > > > > > > That's been said, not compensating the vruntime for a sched_idle task
> > > > > > > makes sense for me. Even if that will only help for others task in the
> > > > > > > same cfs_rq
> > > > > >
> > > > > > Yeah, but it is worth the extra pointer chasing and branches?
> > > > >
> > > > > For that I let Jiang provides figures to show the worthful
> > > > Using the following configuration for rt-app,
> > > > {
> > > > "tasks" : {
> > > > "task_other" : {
> > > > "instance" : 1, //only 1 instance to be easy to observe
> > > > "cpus" : [2],
> > > > "loop" : 2000,
> > > > "policy" : "SCHED_OTHER",
> > > > "run" : -1, //make normal task 100% running
> > > > "priority" : 0,
> > > > "sleep" : 0
> > > > },
> > > > "task_idle" : {
> > > > "instance" : 1,
> > > > "cpus" : [2],
> > > > "loop" : 2000,
> > > > "policy" : "SCHED_IDLE",
> > > > "run" : 1, //only run 1us to avoid
> > > > blocking(always waiting for running), making check_preempt_wakeup
> > > > work(S->R switching)
> > > > "timer" : { "ref" : "unique2" , "period" :
> > > > 16000, "mode" : "absolute" }
> > > > }
> > > > },
> > > > "global" : {
> > > > "calibration" : "CPU0",
> > > > "default_policy" : "SCHED_OTHER",
> > > > "duration" : -1
> > > > }
> > > > }
> > > > without the patch,
> > > > <...>-39771 [002] d.h. 42478.177771: sched_wakeup:
> > > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > > <...>-39771 [002] d... 42478.190437: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > > <...>-39771 [002] d.h. 42478.193771: sched_wakeup:
> > > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > > <...>-39771 [002] d... 42478.206438: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > > <...>-39771 [002] d.h. 42478.209771: sched_wakeup:
> > > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > > <...>-39771 [002] d... 42478.222438: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > > <...>-39771 [002] d.h. 42478.225771: sched_wakeup:
> > > > comm=task_idle-1 pid=39772 prio=120 target_cpu=002
> > > > <...>-39771 [002] d... 42478.238438: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=39771 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=39772 next_prio=120
> > > > *task_idle* preempts every 12ms because of the compensation.
> > > >
> > > > with the patch,
> > > > task_other-0-27670 [002] d.h. 136785.278059: sched_wakeup:
> > > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > > task_other-0-27670 [002] d... 136785.293623: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > > task_other-0-27670 [002] d.h. 136785.294059: sched_wakeup:
> > > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > > task_other-0-27670 [002] d... 136785.317624: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > > task_other-0-27670 [002] d.h. 136785.326059: sched_wakeup:
> > > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > > task_other-0-27670 [002] d... 136785.341622: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > > task_other-0-27670 [002] d.h. 136785.342059: sched_wakeup:
> > > > comm=task_idle-1 pid=27671 prio=120 target_cpu=002
> > > > task_other-0-27670 [002] d... 136785.365623: sched_switch:
> > > > prev_comm=task_other-0 prev_pid=27670 prev_prio=120 prev_state=R ==>
> > > > next_comm=task_idle-1 next_pid=27671 next_prio=120
> > > > *task_idle* preempts every 24 or 16 ms.
> > > >
> > > > This patch could reduce the preempting frequency of task_idle, and
> > > > reduce the interference from SCHED_IDLE task.
> > >
> > > For this use case, the preemption is only 1us long. Is it a realistic
> > > problem use case ? your normal threads might be more impacted by tick,
> > Nop.
> > 1us is just to make the logic in place_entity() work. If the preemption is
> > longer, the IDLE task could not finish its work before being preempted back
> > by normal task, and the IDLE task would be always in RUNNING status from
> > then on. In that case place_entity() would never be reached because of the
> > RUNNING status.
>
> Yeah, I agree that the setup is the right one to check the worst
> wakeup pre emption period but it doesn't sound like a realistic
> problem
Indeed.

>
> Have you tried this with your system and does it improve anything ?
> Otherwise, I agree with Peter that it doesn't worth having an
> additional test in the wakeup path if it doesn't help any cases
We have not utilized SCHED_IDLE for running offline tasks, so we can
not give a realistic scenario for that.
So let's just forget about it for now.

Much appreciated for your time again. :)
Thx.
Regards,
Jiang