2021-11-24 15:43:00

by Vincent Donnefort

[permalink] [raw]
Subject: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

select_idle_sibling() will return prev_cpu for the case where the task is
woken up by a per-CPU kthread. However, the idle task has been recently
modified and is now identified by is_per_cpu_kthread(), breaking the
behaviour described above. Using !is_idle_task() ensures we do not
spuriously trigger that select_idle_sibling() exit path.

Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
Signed-off-by: Vincent Donnefort <[email protected]>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 945d987246c5..8bf95b0e368d 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
* pattern is IO completions.
*/
if (is_per_cpu_kthread(current) &&
+ !is_idle_task(current) &&
prev == smp_processor_id() &&
this_rq()->nr_running <= 1) {
return prev;
--
2.25.1



2021-11-24 16:28:32

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 24/11/21 15:42, Vincent Donnefort wrote:
> select_idle_sibling() will return prev_cpu for the case where the task is
> woken up by a per-CPU kthread. However, the idle task has been recently
> modified and is now identified by is_per_cpu_kthread(), breaking the
> behaviour described above. Using !is_idle_task() ensures we do not
> spuriously trigger that select_idle_sibling() exit path.
>
> Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")

This patch-set is the gift that keeps on giving... I owe a lot of folks a
lot of beer :(

> Signed-off-by: Vincent Donnefort <[email protected]>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 945d987246c5..8bf95b0e368d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> * pattern is IO completions.
> */
> if (is_per_cpu_kthread(current) &&
> + !is_idle_task(current) &&
> prev == smp_processor_id() &&
^^^^^^^^^^^^^^^^^^^^^^^^^^
(1)

> this_rq()->nr_running <= 1) {

So if we get to here, it means we failed

if ((available_idle_cpu(target) || sched_idle_cpu(target)) &&
asym_fits_capacity(task_util, target))
return target;

AFAICT (1) implies "prev == target" (target can be either prev or the
waking CPU), so per the above this implies prev isn't idle. If current is
the idle task, we can still have stuff enqueued (which matches nr_running
<= 1) and be on our way to schedule_idle(), or have rq->ttwu_pending (per
idle_cpu()) - IOW matching against the idle task here can lead to undesired
coscheduling.

If the above isn't bonkers:

Reviewed-by: Valentin Schneider <[email protected]>

> return prev;
> --
> 2.25.1

2021-11-25 09:16:36

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Wed, 24 Nov 2021 at 16:42, Vincent Donnefort
<[email protected]> wrote:
>
> select_idle_sibling() will return prev_cpu for the case where the task is
> woken up by a per-CPU kthread. However, the idle task has been recently
> modified and is now identified by is_per_cpu_kthread(), breaking the
> behaviour described above. Using !is_idle_task() ensures we do not
> spuriously trigger that select_idle_sibling() exit path.
>
> Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
> Signed-off-by: Vincent Donnefort <[email protected]>
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 945d987246c5..8bf95b0e368d 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> * pattern is IO completions.
> */
> if (is_per_cpu_kthread(current) &&
> + !is_idle_task(current) &&
> prev == smp_processor_id() &&
> this_rq()->nr_running <= 1) {
> return prev;

AFAICT, this can't be possible for a symmetric system because it would
have been already returned by other conditions.
Only an asymmetric system can face such a situation if the task
doesn't fit which is the subject of your other patch.
so this patch seems irrelevant outside the other one


> --
> 2.25.1
>

2021-11-25 11:17:43

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 25/11/21 10:05, Vincent Guittot wrote:
> On Wed, 24 Nov 2021 at 16:42, Vincent Donnefort
> <[email protected]> wrote:
>>
>> select_idle_sibling() will return prev_cpu for the case where the task is
>> woken up by a per-CPU kthread. However, the idle task has been recently
>> modified and is now identified by is_per_cpu_kthread(), breaking the
>> behaviour described above. Using !is_idle_task() ensures we do not
>> spuriously trigger that select_idle_sibling() exit path.
>>
>> Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
>> Signed-off-by: Vincent Donnefort <[email protected]>
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 945d987246c5..8bf95b0e368d 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>> * pattern is IO completions.
>> */
>> if (is_per_cpu_kthread(current) &&
>> + !is_idle_task(current) &&
>> prev == smp_processor_id() &&
>> this_rq()->nr_running <= 1) {
>> return prev;
>
> AFAICT, this can't be possible for a symmetric system because it would
> have been already returned by other conditions.
> Only an asymmetric system can face such a situation if the task
> doesn't fit which is the subject of your other patch.
> so this patch seems irrelevant outside the other one
>

I think you can still hit this on a symmetric system; let me try to
reformulate my other email.

If this (non-patched) condition evaluates to true, it means the previous
condition

(available_idle_cpu(target) || sched_idle_cpu(target)) &&
asym_fits_capacity(task_util, target)

evaluated to false, so for a symmetric system target sure isn't idle.

prev == smp_processor_id() implies prev == target, IOW prev isn't
idle. Now, consider:

p0.prev = CPU1
p1.prev = CPU1

CPU0 CPU1
current = don't care current = swapper/1

ttwu(p1)
ttwu_queue(p1, CPU1)
// or
ttwu_queue_wakelist(p1, CPU1)

hrtimer_wakeup()
wake_up_process()
ttwu()
idle_cpu(CPU1)? no

is_per_cpu_kthread(current)? yes
prev == smp_processor_id()? yes
this_rq()->nr_running <= 1? yes
=> self enqueue

...
schedule_idle()

This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
around, we'd still be stacking p0 and p1 onto CPU1.

IOW this opens a window between a remote ttwu() and the idle task invoking
schedule_idle() where the idle task can stack more tasks onto its CPU.

>
>> --
>> 2.25.1
>>

2021-11-25 13:20:07

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 25.11.21 12:16, Valentin Schneider wrote:
> On 25/11/21 10:05, Vincent Guittot wrote:
>> On Wed, 24 Nov 2021 at 16:42, Vincent Donnefort
>> <[email protected]> wrote:
>>>
>>> select_idle_sibling() will return prev_cpu for the case where the task is
>>> woken up by a per-CPU kthread. However, the idle task has been recently
>>> modified and is now identified by is_per_cpu_kthread(), breaking the
>>> behaviour described above. Using !is_idle_task() ensures we do not
>>> spuriously trigger that select_idle_sibling() exit path.
>>>
>>> Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
>>> Signed-off-by: Vincent Donnefort <[email protected]>
>>>
>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>>> index 945d987246c5..8bf95b0e368d 100644
>>> --- a/kernel/sched/fair.c
>>> +++ b/kernel/sched/fair.c
>>> @@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
>>> * pattern is IO completions.
>>> */
>>> if (is_per_cpu_kthread(current) &&
>>> + !is_idle_task(current) &&
>>> prev == smp_processor_id() &&
>>> this_rq()->nr_running <= 1) {
>>> return prev;
>>
>> AFAICT, this can't be possible for a symmetric system because it would
>> have been already returned by other conditions.
>> Only an asymmetric system can face such a situation if the task
>> doesn't fit which is the subject of your other patch.
>> so this patch seems irrelevant outside the other one
>>
>
> I think you can still hit this on a symmetric system; let me try to
> reformulate my other email.
>
> If this (non-patched) condition evaluates to true, it means the previous
> condition
>
> (available_idle_cpu(target) || sched_idle_cpu(target)) &&
> asym_fits_capacity(task_util, target)
>
> evaluated to false, so for a symmetric system target sure isn't idle.
>
> prev == smp_processor_id() implies prev == target, IOW prev isn't
> idle. Now, consider:
>
> p0.prev = CPU1
> p1.prev = CPU1
>
> CPU0 CPU1
> current = don't care current = swapper/1
>
> ttwu(p1)
> ttwu_queue(p1, CPU1)
> // or
> ttwu_queue_wakelist(p1, CPU1)
>
> hrtimer_wakeup()
> wake_up_process()
> ttwu()
> idle_cpu(CPU1)? no
>
> is_per_cpu_kthread(current)? yes
> prev == smp_processor_id()? yes
> this_rq()->nr_running <= 1? yes
> => self enqueue
>
> ...
> schedule_idle()
>
> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
> around, we'd still be stacking p0 and p1 onto CPU1.
>
> IOW this opens a window between a remote ttwu() and the idle task invoking
> schedule_idle() where the idle task can stack more tasks onto its CPU.

I can see this happening on my Hikey620 (symmetric) when `this = prev =
target`.

available_idle_cpu(target) returns 0. rq->curr is rq->idle but
rq->nr_running is 1.

trace_printk() in sis()' `if (is_per_cpu_kthread(current) &&`
condition.

<idle>-0 [005] this=5 prev=5 target=5 rq->curr=[swapper/5 0] rq->nr_running=1 p=[kworker/u16:3 89] current=[swapper/5 0]
<idle>-0 [007] this=7 prev=7 target=7 rq->curr=[swapper/7 0] rq->nr_running=1 p=[rcu_preempt 11] current=[swapper/7 0]
<idle>-0 [005] this=5 prev=5 target=5 rq->curr=[swapper/5 0] rq->nr_running=1 p=[kworker/u16:1 74] current=[swapper/5 0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
available_idle_cpu(target)->idle_cpu(target)


2021-11-25 13:35:00

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Thu, 25 Nov 2021 at 12:16, Valentin Schneider
<[email protected]> wrote:
>
> On 25/11/21 10:05, Vincent Guittot wrote:
> > On Wed, 24 Nov 2021 at 16:42, Vincent Donnefort
> > <[email protected]> wrote:
> >>
> >> select_idle_sibling() will return prev_cpu for the case where the task is
> >> woken up by a per-CPU kthread. However, the idle task has been recently
> >> modified and is now identified by is_per_cpu_kthread(), breaking the
> >> behaviour described above. Using !is_idle_task() ensures we do not
> >> spuriously trigger that select_idle_sibling() exit path.
> >>
> >> Fixes: 00b89fe0197f ("sched: Make the idle task quack like a per-CPU kthread")
> >> Signed-off-by: Vincent Donnefort <[email protected]>
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index 945d987246c5..8bf95b0e368d 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -6399,6 +6399,7 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> >> * pattern is IO completions.
> >> */
> >> if (is_per_cpu_kthread(current) &&
> >> + !is_idle_task(current) &&
> >> prev == smp_processor_id() &&
> >> this_rq()->nr_running <= 1) {
> >> return prev;
> >
> > AFAICT, this can't be possible for a symmetric system because it would
> > have been already returned by other conditions.
> > Only an asymmetric system can face such a situation if the task
> > doesn't fit which is the subject of your other patch.
> > so this patch seems irrelevant outside the other one
> >
>
> I think you can still hit this on a symmetric system; let me try to
> reformulate my other email.
>
> If this (non-patched) condition evaluates to true, it means the previous
> condition
>
> (available_idle_cpu(target) || sched_idle_cpu(target)) &&
> asym_fits_capacity(task_util, target)
>
> evaluated to false, so for a symmetric system target sure isn't idle.
>
> prev == smp_processor_id() implies prev == target, IOW prev isn't
> idle. Now, consider:
>
> p0.prev = CPU1
> p1.prev = CPU1
>
> CPU0 CPU1
> current = don't care current = swapper/1
>
> ttwu(p1)
> ttwu_queue(p1, CPU1)
> // or
> ttwu_queue_wakelist(p1, CPU1)
>
> hrtimer_wakeup()
> wake_up_process()
> ttwu()
> idle_cpu(CPU1)? no
>
> is_per_cpu_kthread(current)? yes
> prev == smp_processor_id()? yes
> this_rq()->nr_running <= 1? yes
> => self enqueue
>
> ...
> schedule_idle()
>
> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
> around, we'd still be stacking p0 and p1 onto CPU1.
>
> IOW this opens a window between a remote ttwu() and the idle task invoking
> schedule_idle() where the idle task can stack more tasks onto its CPU.

Your use case above is out of the scope of this patch and has always
been there, even for other per cpu kthreads. In such case, the wake up
is not triggered by current (idle or another per cpu kthread) but by
an interrupt (hrtimer in your case). If we want to filter wakeup
generated by interrupt context while a per cpu kthread is running, it
would be better to fix all cases and test the running context like
this

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct
task_struct *p, int prev, int target)
* essentially a sync wakeup. An obvious example of this
* pattern is IO completions.
*/
- if (is_per_cpu_kthread(current) &&
+ if (!in_interrupt() &&
+ is_per_cpu_kthread(current) &&
prev == smp_processor_id() &&
this_rq()->nr_running <= 1) {
return prev;

>
> >
> >> --
> >> 2.25.1
> >>

2021-11-25 15:32:15

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 25/11/21 14:23, Vincent Guittot wrote:
> On Thu, 25 Nov 2021 at 12:16, Valentin Schneider
> <[email protected]> wrote:
>> I think you can still hit this on a symmetric system; let me try to
>> reformulate my other email.
>>
>> If this (non-patched) condition evaluates to true, it means the previous
>> condition
>>
>> (available_idle_cpu(target) || sched_idle_cpu(target)) &&
>> asym_fits_capacity(task_util, target)
>>
>> evaluated to false, so for a symmetric system target sure isn't idle.
>>
>> prev == smp_processor_id() implies prev == target, IOW prev isn't
>> idle. Now, consider:
>>
>> p0.prev = CPU1
>> p1.prev = CPU1
>>
>> CPU0 CPU1
>> current = don't care current = swapper/1
>>
>> ttwu(p1)
>> ttwu_queue(p1, CPU1)
>> // or
>> ttwu_queue_wakelist(p1, CPU1)
>>
>> hrtimer_wakeup()
>> wake_up_process()
>> ttwu()
>> idle_cpu(CPU1)? no
>>
>> is_per_cpu_kthread(current)? yes
>> prev == smp_processor_id()? yes
>> this_rq()->nr_running <= 1? yes
>> => self enqueue
>>
>> ...
>> schedule_idle()
>>
>> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
>> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
>> around, we'd still be stacking p0 and p1 onto CPU1.
>>
>> IOW this opens a window between a remote ttwu() and the idle task invoking
>> schedule_idle() where the idle task can stack more tasks onto its CPU.
>
> Your use case above is out of the scope of this patch and has always
> been there, even for other per cpu kthreads. In such case, the wake up
> is not triggered by current (idle or another per cpu kthread) but by
> an interrupt (hrtimer in your case).

Technically the idle task didn't pass is_per_cpu_kthread(p) when that
condition was added, this is somewhat of a "new development" - but you're
right on the hardirq side of things.

> If we want to filter wakeup
> generated by interrupt context while a per cpu kthread is running, it
> would be better to fix all cases and test the running context like
> this
>

I think that could make sense - though can the idle task issue wakeups in
process context? If so that won't be sufficient. A quick audit tells me:

o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
o I didn't see any wakeup issued from the cpu_pm_notifier call chain
o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
this thing in RCU:

smp_call_function_single(cpu, rcu_exp_handler)

rcu_exp_handler()
rcu_report_exp_rdp()
rcu_report_exp_cpu_mult()
__rcu_report_exp_rnp()
swake_up_one()

IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
handled in that flush_foo_from_idle() call.

I'd be tempted to stick your VincentD's conditions together, just to be
safe...

> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct
> task_struct *p, int prev, int target)
> * essentially a sync wakeup. An obvious example of this
> * pattern is IO completions.
> */
> - if (is_per_cpu_kthread(current) &&
> + if (!in_interrupt() &&
> + is_per_cpu_kthread(current) &&
> prev == smp_processor_id() &&
> this_rq()->nr_running <= 1) {
> return prev;
>
>>
>> >
>> >> --
>> >> 2.25.1
>> >>

2021-11-26 08:34:34

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Thu, 25 Nov 2021 at 16:30, Valentin Schneider
<[email protected]> wrote:
>
> On 25/11/21 14:23, Vincent Guittot wrote:
> > On Thu, 25 Nov 2021 at 12:16, Valentin Schneider
> > <[email protected]> wrote:
> >> I think you can still hit this on a symmetric system; let me try to
> >> reformulate my other email.
> >>
> >> If this (non-patched) condition evaluates to true, it means the previous
> >> condition
> >>
> >> (available_idle_cpu(target) || sched_idle_cpu(target)) &&
> >> asym_fits_capacity(task_util, target)
> >>
> >> evaluated to false, so for a symmetric system target sure isn't idle.
> >>
> >> prev == smp_processor_id() implies prev == target, IOW prev isn't
> >> idle. Now, consider:
> >>
> >> p0.prev = CPU1
> >> p1.prev = CPU1
> >>
> >> CPU0 CPU1
> >> current = don't care current = swapper/1
> >>
> >> ttwu(p1)
> >> ttwu_queue(p1, CPU1)
> >> // or
> >> ttwu_queue_wakelist(p1, CPU1)
> >>
> >> hrtimer_wakeup()
> >> wake_up_process()
> >> ttwu()
> >> idle_cpu(CPU1)? no
> >>
> >> is_per_cpu_kthread(current)? yes
> >> prev == smp_processor_id()? yes
> >> this_rq()->nr_running <= 1? yes
> >> => self enqueue
> >>
> >> ...
> >> schedule_idle()
> >>
> >> This works if CPU0 does either a full enqueue (rq->nr_running == 1) or just
> >> a wakelist enqueue (rq->ttwu_pending > 0). If there was an idle CPU3
> >> around, we'd still be stacking p0 and p1 onto CPU1.
> >>
> >> IOW this opens a window between a remote ttwu() and the idle task invoking
> >> schedule_idle() where the idle task can stack more tasks onto its CPU.
> >
> > Your use case above is out of the scope of this patch and has always
> > been there, even for other per cpu kthreads. In such case, the wake up
> > is not triggered by current (idle or another per cpu kthread) but by
> > an interrupt (hrtimer in your case).
>
> Technically the idle task didn't pass is_per_cpu_kthread(p) when that
> condition was added, this is somewhat of a "new development" - but you're
> right on the hardirq side of things.
>
> > If we want to filter wakeup
> > generated by interrupt context while a per cpu kthread is running, it
> > would be better to fix all cases and test the running context like
> > this
> >
>
> I think that could make sense - though can the idle task issue wakeups in
> process context? If so that won't be sufficient. A quick audit tells me:
>
> o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
> o I didn't see any wakeup issued from the cpu_pm_notifier call chain
> o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
> this thing in RCU:
>
> smp_call_function_single(cpu, rcu_exp_handler)
>
> rcu_exp_handler()
> rcu_report_exp_rdp()
> rcu_report_exp_cpu_mult()
> __rcu_report_exp_rnp()
> swake_up_one()
>
> IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
> handled in that flush_foo_from_idle() call.

Aren't all these planned to wakeup on local cpu ? so i don't see any
real problem there

>
> I'd be tempted to stick your VincentD's conditions together, just to be
> safe...

More than safe I would prefer that we fix the correct root cause
instead of hiding it

>
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6397,7 +6397,8 @@ static int select_idle_sibling(struct
> > task_struct *p, int prev, int target)
> > * essentially a sync wakeup. An obvious example of this
> > * pattern is IO completions.
> > */
> > - if (is_per_cpu_kthread(current) &&
> > + if (!in_interrupt() &&
> > + is_per_cpu_kthread(current) &&
> > prev == smp_processor_id() &&
> > this_rq()->nr_running <= 1) {
> > return prev;
> >
> >>
> >> >
> >> >> --
> >> >> 2.25.1
> >> >>

2021-11-26 13:34:29

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 26/11/21 09:23, Vincent Guittot wrote:
> On Thu, 25 Nov 2021 at 16:30, Valentin Schneider
> <[email protected]> wrote:
>> On 25/11/21 14:23, Vincent Guittot wrote:
>> > If we want to filter wakeup
>> > generated by interrupt context while a per cpu kthread is running, it
>> > would be better to fix all cases and test the running context like
>> > this
>> >
>>
>> I think that could make sense - though can the idle task issue wakeups in
>> process context? If so that won't be sufficient. A quick audit tells me:
>>
>> o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
>> o I didn't see any wakeup issued from the cpu_pm_notifier call chain
>> o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
>> this thing in RCU:
>>
>> smp_call_function_single(cpu, rcu_exp_handler)
>>
>> rcu_exp_handler()
>> rcu_report_exp_rdp()
>> rcu_report_exp_cpu_mult()
>> __rcu_report_exp_rnp()
>> swake_up_one()
>>
>> IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
>> handled in that flush_foo_from_idle() call.
>
> Aren't all these planned to wakeup on local cpu ? so i don't see any
> real problem there
>

Hm so other than boot time oddities I think that does end up with threads
of an !UNBOUND (so pcpu) workqueue...

>>
>> I'd be tempted to stick your VincentD's conditions together, just to be
>> safe...
>
> More than safe I would prefer that we fix the correct root cause
> instead of hiding it
>

I did play around a bit to see if this could be true when evaluating that
is_per_cpu_kthread() condition:

is_idle_task(current) && in_task() && p->nr_cpus_allowed > 1

but no luck so far. An in_task() check would appear sufficient, but how's
this?

---

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 884f29d07963..f45806b7f47a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6390,14 +6390,18 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
return prev;

/*
- * Allow a per-cpu kthread to stack with the wakee if the
- * kworker thread and the tasks previous CPUs are the same.
- * The assumption is that the wakee queued work for the
- * per-cpu kthread that is now complete and the wakeup is
- * essentially a sync wakeup. An obvious example of this
+ * Allow a per-cpu kthread to stack with the wakee if the kworker thread
+ * and the tasks previous CPUs are the same. The assumption is that the
+ * wakee queued work for the per-cpu kthread that is now complete and
+ * the wakeup is essentially a sync wakeup. An obvious example of this
* pattern is IO completions.
+ *
+ * Ensure the wakeup is issued by the kthread itself, and don't match
+ * against the idle task because that could override the
+ * available_idle_cpu(target) check done higher up.
*/
- if (is_per_cpu_kthread(current) &&
+ if (is_per_cpu_kthread(current) && !is_idle_task(current) &&
+ in_task() &&
prev == smp_processor_id() &&
this_rq()->nr_running <= 1) {
return prev;


2021-11-26 15:01:54

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Fri, 26 Nov 2021 at 14:32, Valentin Schneider
<[email protected]> wrote:
>
> On 26/11/21 09:23, Vincent Guittot wrote:
> > On Thu, 25 Nov 2021 at 16:30, Valentin Schneider
> > <[email protected]> wrote:
> >> On 25/11/21 14:23, Vincent Guittot wrote:
> >> > If we want to filter wakeup
> >> > generated by interrupt context while a per cpu kthread is running, it
> >> > would be better to fix all cases and test the running context like
> >> > this
> >> >
> >>
> >> I think that could make sense - though can the idle task issue wakeups in
> >> process context? If so that won't be sufficient. A quick audit tells me:
> >>
> >> o rcu_nocb_flush_deferred_wakeup() happens before calling into cpuidle
> >> o I didn't see any wakeup issued from the cpu_pm_notifier call chain
> >> o I'm not entirely sure about flush_smp_call_function_from_idle(). I found
> >> this thing in RCU:
> >>
> >> smp_call_function_single(cpu, rcu_exp_handler)
> >>
> >> rcu_exp_handler()
> >> rcu_report_exp_rdp()
> >> rcu_report_exp_cpu_mult()
> >> __rcu_report_exp_rnp()
> >> swake_up_one()
> >>
> >> IIUC if set_nr_if_polling() then the smp_call won't send an IPI and should be
> >> handled in that flush_foo_from_idle() call.
> >
> > Aren't all these planned to wakeup on local cpu ? so i don't see any
> > real problem there
> >
>
> Hm so other than boot time oddities I think that does end up with threads
> of an !UNBOUND (so pcpu) workqueue...
>
> >>
> >> I'd be tempted to stick your VincentD's conditions together, just to be
> >> safe...
> >
> > More than safe I would prefer that we fix the correct root cause
> > instead of hiding it
> >
>
> I did play around a bit to see if this could be true when evaluating that
> is_per_cpu_kthread() condition:
>
> is_idle_task(current) && in_task() && p->nr_cpus_allowed > 1
>
> but no luck so far. An in_task() check would appear sufficient, but how's
> this?
>
> ---
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 884f29d07963..f45806b7f47a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6390,14 +6390,18 @@ static int select_idle_sibling(struct task_struct *p, int prev, int target)
> return prev;
>
> /*
> - * Allow a per-cpu kthread to stack with the wakee if the
> - * kworker thread and the tasks previous CPUs are the same.
> - * The assumption is that the wakee queued work for the
> - * per-cpu kthread that is now complete and the wakeup is
> - * essentially a sync wakeup. An obvious example of this
> + * Allow a per-cpu kthread to stack with the wakee if the kworker thread
> + * and the tasks previous CPUs are the same. The assumption is that the
> + * wakee queued work for the per-cpu kthread that is now complete and
> + * the wakeup is essentially a sync wakeup. An obvious example of this
> * pattern is IO completions.
> + *
> + * Ensure the wakeup is issued by the kthread itself, and don't match
> + * against the idle task because that could override the
> + * available_idle_cpu(target) check done higher up.
> */
> - if (is_per_cpu_kthread(current) &&
> + if (is_per_cpu_kthread(current) && !is_idle_task(current) &&

still i don't see the need of !is_idle_task(current)


> + in_task() &&
> prev == smp_processor_id() &&
> this_rq()->nr_running <= 1) {
> return prev;
>

2021-11-26 16:51:21

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 26/11/21 15:40, Vincent Guittot wrote:
> On Fri, 26 Nov 2021 at 14:32, Valentin Schneider
> <[email protected]> wrote:
>> /*
>> - * Allow a per-cpu kthread to stack with the wakee if the
>> - * kworker thread and the tasks previous CPUs are the same.
>> - * The assumption is that the wakee queued work for the
>> - * per-cpu kthread that is now complete and the wakeup is
>> - * essentially a sync wakeup. An obvious example of this
>> + * Allow a per-cpu kthread to stack with the wakee if the kworker thread
>> + * and the tasks previous CPUs are the same. The assumption is that the
>> + * wakee queued work for the per-cpu kthread that is now complete and
>> + * the wakeup is essentially a sync wakeup. An obvious example of this
>> * pattern is IO completions.
>> + *
>> + * Ensure the wakeup is issued by the kthread itself, and don't match
>> + * against the idle task because that could override the
>> + * available_idle_cpu(target) check done higher up.
>> */
>> - if (is_per_cpu_kthread(current) &&
>> + if (is_per_cpu_kthread(current) && !is_idle_task(current) &&
>
> still i don't see the need of !is_idle_task(current)
>

Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
which can lead to coscheduling when the wakeup is issued by the idle task
(or even if rq->nr_running == 0, you can have rq->ttwu_pending without
having sent an IPI due to polling). Essentially this overrides the first
check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
target).

I couldn't prove such wakeups can happen right now, but if/when they do
(AIUI it would just take someone to add a wake_up_process() down some
smp_call_function() callback) then we'll need the above. If you're still
not convinced by now, I won't push it further.

>
>> + in_task() &&
>> prev == smp_processor_id() &&
>> this_rq()->nr_running <= 1) {
>> return prev;
>>

2021-11-26 17:20:28

by Vincent Donnefort

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Fri, Nov 26, 2021 at 04:49:12PM +0000, Valentin Schneider wrote:
> On 26/11/21 15:40, Vincent Guittot wrote:
> > On Fri, 26 Nov 2021 at 14:32, Valentin Schneider
> > <[email protected]> wrote:
> >> /*
> >> - * Allow a per-cpu kthread to stack with the wakee if the
> >> - * kworker thread and the tasks previous CPUs are the same.
> >> - * The assumption is that the wakee queued work for the
> >> - * per-cpu kthread that is now complete and the wakeup is
> >> - * essentially a sync wakeup. An obvious example of this
> >> + * Allow a per-cpu kthread to stack with the wakee if the kworker thread
> >> + * and the tasks previous CPUs are the same. The assumption is that the
> >> + * wakee queued work for the per-cpu kthread that is now complete and
> >> + * the wakeup is essentially a sync wakeup. An obvious example of this
> >> * pattern is IO completions.
> >> + *
> >> + * Ensure the wakeup is issued by the kthread itself, and don't match
> >> + * against the idle task because that could override the
> >> + * available_idle_cpu(target) check done higher up.
> >> */
> >> - if (is_per_cpu_kthread(current) &&
> >> + if (is_per_cpu_kthread(current) && !is_idle_task(current) &&
> >
> > still i don't see the need of !is_idle_task(current)
> >
>
> Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> which can lead to coscheduling when the wakeup is issued by the idle task
> (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> having sent an IPI due to polling). Essentially this overrides the first
> check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> target).
>
> I couldn't prove such wakeups can happen right now, but if/when they do
> (AIUI it would just take someone to add a wake_up_process() down some
> smp_call_function() callback) then we'll need the above. If you're still
> not convinced by now, I won't push it further.

From a quick experiment, even with the asym_fits_capacity(), I can trigger
the following:

[ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.171943] select_idle_sibling: wakee=rcu_sched:10 nr_cpus_allowed=8 current=swapper/0:1 in_task=1

So the in_task() condition doesn't appear to be enough to filter wakeups
while we have the swapper as a current.

>
> >
> >> + in_task() &&
> >> prev == smp_processor_id() &&
> >> this_rq()->nr_running <= 1) {
> >> return prev;
> >>

2021-11-29 08:43:37

by Oliver Sang

[permalink] [raw]
Subject: [sched/fair] 8d0920b981: stress-ng.sem.ops_per_sec 11.9% improvement



Greeting,

FYI, we noticed a 11.9% improvement of stress-ng.sem.ops_per_sec due to commit:


commit: 8d0920b981b634bfedfd0746451839d6f5d7f707 ("[PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task")
url: https://github.com/0day-ci/linux/commits/Vincent-Donnefort/sched-fair-Fix-detection-of-per-CPU-kthreads-waking-a-task/20211124-234430
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 8c92606ab81086db00cbb73347d124b4eb169b7e
patch link: https://lore.kernel.org/lkml/[email protected]

in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory
with following parameters:

nr_threads: 100%
testtime: 60s
sc_pid_max: 4194304
class: scheduler
test: sem
cpufreq_governor: performance
ucode: 0x5003006






Details are as below:
-------------------------------------------------------------------------------------------------->


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.

=========================================================================================
class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/sc_pid_max/tbox_group/test/testcase/testtime/ucode:
scheduler/gcc-9/performance/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/4194304/lkp-csl-2sp7/sem/stress-ng/60s/0x5003006

commit:
8c92606ab8 ("sched/cpuacct: Make user/system times in cpuacct.stat more precise")
8d0920b981 ("sched/fair: Fix detection of per-CPU kthreads waking a task")

8c92606ab81086db 8d0920b981b634bfedfd0746451
---------------- ---------------------------
%stddev %change %stddev
\ | \
4.488e+08 +11.9% 5.023e+08 ? 2% stress-ng.sem.ops
7479868 +11.9% 8371718 ? 2% stress-ng.sem.ops_per_sec
44686811 ? 2% -43.4% 25289053 ? 9% stress-ng.time.involuntary_context_switches
19505 +13.5% 22136 ? 2% stress-ng.time.minor_page_faults
1099 +66.3% 1828 ? 4% stress-ng.time.percent_of_cpu_this_job_got
523.06 +44.7% 756.74 ? 4% stress-ng.time.system_time
159.55 ? 2% +136.8% 377.80 ? 16% stress-ng.time.user_time
2.244e+08 +11.9% 2.51e+08 ? 2% stress-ng.time.voluntary_context_switches
1.351e+08 +64.0% 2.215e+08 ? 4% cpuidle..usage
5.81 ? 44% +8.8 14.65 ? 7% mpstat.cpu.all.irq%
381.04 +2.0% 388.53 pmeter.Average_Active_Power
2457 ? 10% +26.5% 3109 ? 8% slabinfo.kmalloc-cg-16.active_objs
2457 ? 10% +26.5% 3109 ? 8% slabinfo.kmalloc-cg-16.num_objs
19769 ? 3% +18.6% 23443 ? 3% meminfo.Active
19514 ? 3% +18.8% 23188 ? 3% meminfo.Active(anon)
32952 ? 2% +15.2% 37965 ? 2% meminfo.Shmem
20.80 ? 8% +52.5% 31.71 ? 6% vmstat.procs.r
6251194 +22.7% 7669110 ? 2% vmstat.system.cs
1664035 -7.4% 1540404 vmstat.system.in
3221 ? 8% -49.1% 1640 ? 83% numa-vmstat.node0.nr_shmem
4430 ? 3% +23.6% 5476 ? 4% numa-vmstat.node1.nr_active_anon
798.40 ? 69% +400.5% 3996 ? 92% numa-vmstat.node1.nr_mapped
5018 ? 6% +56.4% 7850 ? 16% numa-vmstat.node1.nr_shmem
4430 ? 3% +23.6% 5476 ? 4% numa-vmstat.node1.nr_zone_active_anon
12885 ? 8% -49.1% 6563 ? 83% numa-meminfo.node0.Shmem
194184 ? 2% -18.6% 158144 ? 21% numa-meminfo.node0.Slab
17773 ? 3% +23.9% 22013 ? 4% numa-meminfo.node1.Active
17722 ? 3% +23.6% 21904 ? 4% numa-meminfo.node1.Active(anon)
3194 ? 69% +400.4% 15985 ? 92% numa-meminfo.node1.Mapped
1078298 ? 20% +87.5% 2021914 ? 56% numa-meminfo.node1.MemUsed
20072 ? 6% +56.5% 31404 ? 16% numa-meminfo.node1.Shmem
4878 ? 3% +18.8% 5797 ? 3% proc-vmstat.nr_active_anon
10268 +3.5% 10632 ? 2% proc-vmstat.nr_mapped
8237 ? 2% +15.2% 9491 ? 2% proc-vmstat.nr_shmem
4878 ? 3% +18.8% 5797 ? 3% proc-vmstat.nr_zone_active_anon
249939 ? 4% +58.8% 396814 ? 5% proc-vmstat.numa_pte_updates
11266 ? 3% +37.6% 15502 ? 4% proc-vmstat.pgactivate
351816 +2.0% 358879 proc-vmstat.pgfault
894.60 ? 2% +18.9% 1063 ? 3% turbostat.Avg_MHz
32.11 ? 2% +6.0 38.13 ? 3% turbostat.Busy%
55616227 ? 6% +255.0% 1.974e+08 ? 5% turbostat.C1
22.56 ? 5% +39.4 61.99 ? 2% turbostat.C1%
77386656 ? 3% -76.4% 18239341 ? 13% turbostat.C1E
47.00 ? 5% -35.6 11.41 ? 12% turbostat.C1E%
228.02 +3.2% 235.30 turbostat.PkgWatt
152.15 -2.7% 148.03 turbostat.RAMWatt
0.02 ? 78% -72.6% 0.01 ? 87% perf-sched.sch_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
3.05 ? 24% -73.8% 0.80 ? 67% perf-sched.sch_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
19.32 ? 12% -32.8% 12.98 ? 29% perf-sched.total_wait_and_delay.max.ms
19.31 ? 12% -33.6% 12.83 ? 30% perf-sched.total_wait_time.max.ms
1.77 ? 6% -49.1% 0.90 ? 86% perf-sched.wait_and_delay.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.16 ? 35% -53.7% 0.07 ? 67% perf-sched.wait_and_delay.avg.ms.preempt_schedule_common.__cond_resched.wait_for_completion.affine_move_task.__set_cpus_allowed_ptr_locked
3.52 ? 6% -49.0% 1.79 ? 86% perf-sched.wait_and_delay.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
3.05 ? 24% -58.1% 1.28 ? 27% perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
18.83 ? 14% -39.7% 11.36 ? 31% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork
1.75 ? 6% -48.9% 0.89 ? 86% perf-sched.wait_time.avg.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
0.16 ? 35% -53.7% 0.07 ? 67% perf-sched.wait_time.avg.ms.preempt_schedule_common.__cond_resched.wait_for_completion.affine_move_task.__set_cpus_allowed_ptr_locked
3.50 ? 6% -48.9% 1.79 ? 86% perf-sched.wait_time.max.ms.devkmsg_read.vfs_read.ksys_read.do_syscall_64
18.83 ? 14% -42.6% 10.81 ? 36% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork
15.68 ? 4% +31.4% 20.62 perf-stat.i.MPKI
9.107e+09 -7.1% 8.46e+09 ? 2% perf-stat.i.branch-instructions
2.07 -0.3 1.75 ? 2% perf-stat.i.branch-miss-rate%
1.826e+08 ? 2% -22.0% 1.424e+08 perf-stat.i.branch-misses
8.16 ? 18% -5.3 2.86 ? 12% perf-stat.i.cache-miss-rate%
61151992 ? 25% -64.4% 21751636 ? 15% perf-stat.i.cache-misses
7.555e+08 ? 6% +16.3% 8.789e+08 perf-stat.i.cache-references
6481277 +23.0% 7973471 ? 2% perf-stat.i.context-switches
1.86 +38.3% 2.57 ? 4% perf-stat.i.cpi
8.867e+10 ? 2% +22.7% 1.088e+11 ? 2% perf-stat.i.cpu-cycles
837057 ? 2% +208.6% 2583015 ? 3% perf-stat.i.cpu-migrations
1569 ? 17% +222.1% 5055 ? 15% perf-stat.i.cycles-between-cache-misses
0.04 ? 7% +0.1 0.09 ? 17% perf-stat.i.dTLB-load-miss-rate%
5151978 ? 6% +117.1% 11185554 ? 15% perf-stat.i.dTLB-load-misses
1.46e+10 ? 2% -16.6% 1.218e+10 ? 2% perf-stat.i.dTLB-loads
0.01 ? 6% +0.0 0.03 ? 11% perf-stat.i.dTLB-store-miss-rate%
1102322 ? 6% +127.4% 2506657 ? 9% perf-stat.i.dTLB-store-misses
7.926e+09 -6.3% 7.426e+09 ? 2% perf-stat.i.dTLB-stores
50435290 ? 2% +20.6% 60821528 ? 4% perf-stat.i.iTLB-load-misses
77671047 +17.3% 91078229 ? 3% perf-stat.i.iTLB-loads
4.716e+10 ? 2% -11.6% 4.168e+10 ? 2% perf-stat.i.instructions
1163 ? 2% -21.0% 918.78 ? 3% perf-stat.i.instructions-per-iTLB-miss
0.54 -26.4% 0.40 ? 4% perf-stat.i.ipc
0.92 ? 2% +22.7% 1.13 ? 2% perf-stat.i.metric.GHz
1423 ? 9% -32.4% 961.53 ? 42% perf-stat.i.metric.K/sec
337.33 ? 2% -10.5% 301.77 ? 2% perf-stat.i.metric.M/sec
3728 +2.7% 3827 perf-stat.i.minor-faults
37250868 ? 24% -79.1% 7777013 ? 6% perf-stat.i.node-load-misses
1755814 ? 36% -74.2% 453701 ? 23% perf-stat.i.node-loads
9763477 ? 25% -50.5% 4836086 ? 9% perf-stat.i.node-store-misses
490531 ? 6% -30.9% 338718 ? 33% perf-stat.i.node-stores
3740 +2.6% 3839 perf-stat.i.page-faults
16.01 ? 3% +31.8% 21.09 perf-stat.overall.MPKI
2.00 -0.3 1.68 ? 2% perf-stat.overall.branch-miss-rate%
8.03 ? 19% -5.5 2.48 ? 15% perf-stat.overall.cache-miss-rate%
1.88 +39.0% 2.61 ? 4% perf-stat.overall.cpi
1519 ? 17% +237.1% 5123 ? 15% perf-stat.overall.cycles-between-cache-misses
0.04 ? 8% +0.1 0.09 ? 17% perf-stat.overall.dTLB-load-miss-rate%
0.01 ? 7% +0.0 0.03 ? 11% perf-stat.overall.dTLB-store-miss-rate%
935.54 ? 2% -26.6% 686.48 ? 4% perf-stat.overall.instructions-per-iTLB-miss
0.53 -27.9% 0.38 ? 4% perf-stat.overall.ipc
8.962e+09 -7.1% 8.326e+09 ? 2% perf-stat.ps.branch-instructions
1.797e+08 ? 2% -22.0% 1.401e+08 perf-stat.ps.branch-misses
60177942 ? 25% -64.4% 21405117 ? 15% perf-stat.ps.cache-misses
7.434e+08 ? 6% +16.3% 8.649e+08 perf-stat.ps.cache-references
6377951 +23.0% 7846602 ? 2% perf-stat.ps.context-switches
8.726e+10 ? 2% +22.8% 1.071e+11 ? 2% perf-stat.ps.cpu-cycles
823714 ? 2% +208.6% 2541916 ? 3% perf-stat.ps.cpu-migrations
5069909 ? 6% +117.1% 11007308 ? 15% perf-stat.ps.dTLB-load-misses
1.436e+10 ? 2% -16.6% 1.199e+10 ? 2% perf-stat.ps.dTLB-loads
1084759 ? 6% +127.4% 2466721 ? 9% perf-stat.ps.dTLB-store-misses
7.8e+09 -6.3% 7.308e+09 ? 2% perf-stat.ps.dTLB-stores
49631270 ? 2% +20.6% 59853737 ? 4% perf-stat.ps.iTLB-load-misses
76432899 +17.3% 89629050 ? 3% perf-stat.ps.iTLB-loads
4.641e+10 ? 2% -11.6% 4.102e+10 ? 2% perf-stat.ps.instructions
3668 +2.6% 3764 perf-stat.ps.minor-faults
36657427 ? 24% -79.1% 7653097 ? 6% perf-stat.ps.node-load-misses
1727854 ? 36% -74.2% 446510 ? 23% perf-stat.ps.node-loads
9607970 ? 25% -50.5% 4758943 ? 9% perf-stat.ps.node-store-misses
482731 ? 6% -30.9% 333342 ? 33% perf-stat.ps.node-stores
3680 +2.6% 3776 perf-stat.ps.page-faults
2.934e+12 ? 2% -11.7% 2.591e+12 ? 3% perf-stat.total.instructions
22391 ? 4% -32.8% 15040 ? 9% softirqs.CPU0.SCHED
17344 ? 3% -31.3% 11909 ? 12% softirqs.CPU1.SCHED
16640 ? 6% -34.7% 10861 ? 11% softirqs.CPU10.SCHED
16417 ? 4% -33.4% 10931 ? 10% softirqs.CPU11.SCHED
16837 ? 8% -36.9% 10630 ? 9% softirqs.CPU12.SCHED
16286 ? 2% -30.8% 11267 ? 15% softirqs.CPU13.SCHED
16440 ? 3% -32.9% 11037 ? 9% softirqs.CPU14.SCHED
16151 ? 4% -34.1% 10639 ? 9% softirqs.CPU15.SCHED
16090 ? 3% -33.0% 10777 ? 8% softirqs.CPU16.SCHED
16372 ? 3% -31.8% 11158 ? 10% softirqs.CPU17.SCHED
16231 ? 2% -32.1% 11025 ? 8% softirqs.CPU18.SCHED
15929 ? 4% -32.7% 10727 ? 10% softirqs.CPU19.SCHED
17549 ? 5% -34.1% 11569 ? 9% softirqs.CPU2.SCHED
16270 ? 3% -33.2% 10871 ? 11% softirqs.CPU20.SCHED
16374 ? 4% -33.6% 10870 ? 7% softirqs.CPU21.SCHED
16472 ? 3% -33.1% 11021 ? 12% softirqs.CPU22.SCHED
16405 ? 2% -32.2% 11122 ? 12% softirqs.CPU23.SCHED
16580 ? 4% -36.2% 10578 ? 12% softirqs.CPU24.SCHED
15730 -32.6% 10598 ? 11% softirqs.CPU25.SCHED
15877 ? 2% -32.8% 10672 ? 10% softirqs.CPU26.SCHED
15912 ? 2% -31.3% 10925 ? 11% softirqs.CPU27.SCHED
15896 ? 2% -31.7% 10863 ? 11% softirqs.CPU28.SCHED
16045 -31.8% 10948 ? 10% softirqs.CPU29.SCHED
16489 ? 3% -31.6% 11278 ? 10% softirqs.CPU3.SCHED
15868 ? 2% -32.9% 10646 ? 11% softirqs.CPU30.SCHED
15988 ? 3% -33.2% 10687 ? 10% softirqs.CPU31.SCHED
15765 ? 2% -32.1% 10707 ? 12% softirqs.CPU32.SCHED
15797 -34.1% 10417 ? 11% softirqs.CPU33.SCHED
15921 -31.6% 10885 ? 13% softirqs.CPU34.SCHED
15881 ? 2% -31.7% 10852 ? 14% softirqs.CPU35.SCHED
16352 ? 7% -34.2% 10762 ? 14% softirqs.CPU36.SCHED
15932 ? 2% -34.1% 10493 ? 12% softirqs.CPU37.SCHED
15799 -32.2% 10707 ? 10% softirqs.CPU38.SCHED
15935 ? 2% -32.7% 10721 ? 7% softirqs.CPU39.SCHED
16240 ? 3% -35.7% 10447 ? 10% softirqs.CPU40.SCHED
16009 ? 2% -33.0% 10730 ? 13% softirqs.CPU41.SCHED
16160 ? 4% -35.7% 10387 ? 12% softirqs.CPU42.SCHED
15874 -34.5% 10403 ? 12% softirqs.CPU43.SCHED
15851 -34.2% 10431 ? 10% softirqs.CPU44.SCHED
15825 ? 3% -34.3% 10393 ? 15% softirqs.CPU45.SCHED
15785 ? 2% -32.3% 10689 ? 12% softirqs.CPU47.SCHED
16028 ? 3% -35.4% 10348 ? 12% softirqs.CPU48.SCHED
15899 ? 4% -31.2% 10939 ? 9% softirqs.CPU49.SCHED
16483 ? 3% -32.4% 11141 ? 12% softirqs.CPU5.SCHED
16548 -33.8% 10953 ? 11% softirqs.CPU50.SCHED
16411 ? 4% -31.4% 11265 ? 14% softirqs.CPU51.SCHED
15875 ? 2% -32.8% 10675 ? 9% softirqs.CPU52.SCHED
16317 ? 3% -32.1% 11079 ? 12% softirqs.CPU53.SCHED
16070 ? 2% -30.5% 11162 ? 7% softirqs.CPU54.SCHED
16195 ? 2% -32.7% 10893 ? 10% softirqs.CPU55.SCHED
16155 ? 2% -33.9% 10680 ? 10% softirqs.CPU56.SCHED
15984 ? 4% -32.8% 10739 ? 7% softirqs.CPU57.SCHED
16338 ? 3% -32.8% 10983 ? 13% softirqs.CPU58.SCHED
16604 ? 2% -35.2% 10755 ? 11% softirqs.CPU59.SCHED
16357 ? 4% -32.8% 10988 ? 12% softirqs.CPU6.SCHED
16878 ? 8% -35.2% 10934 ? 8% softirqs.CPU60.SCHED
16570 ? 4% -35.5% 10693 ? 8% softirqs.CPU61.SCHED
16652 ? 5% -35.6% 10727 ? 9% softirqs.CPU62.SCHED
16652 ? 5% -34.1% 10972 ? 9% softirqs.CPU63.SCHED
16377 ? 3% -34.0% 10811 ? 12% softirqs.CPU64.SCHED
16324 ? 3% -31.8% 11128 ? 12% softirqs.CPU65.SCHED
16442 ? 3% -32.4% 11111 ? 11% softirqs.CPU66.SCHED
16730 ? 5% -32.2% 11337 ? 12% softirqs.CPU67.SCHED
16409 ? 2% -33.4% 10934 ? 9% softirqs.CPU68.SCHED
16157 ? 2% -33.1% 10815 ? 10% softirqs.CPU69.SCHED
16004 ? 2% -32.3% 10831 ? 9% softirqs.CPU7.SCHED
16374 ? 3% -31.9% 11157 ? 10% softirqs.CPU70.SCHED
16319 -32.8% 10968 ? 9% softirqs.CPU71.SCHED
16194 ? 2% -35.4% 10455 ? 10% softirqs.CPU72.SCHED
15911 -34.8% 10370 ? 11% softirqs.CPU73.SCHED
17109 ? 8% -38.5% 10530 ? 12% softirqs.CPU74.SCHED
15953 ? 2% -31.9% 10864 ? 12% softirqs.CPU75.SCHED
15986 ? 2% -33.1% 10691 ? 13% softirqs.CPU76.SCHED
16008 -33.3% 10669 ? 11% softirqs.CPU77.SCHED
16064 ? 2% -33.4% 10705 ? 13% softirqs.CPU78.SCHED
16202 ? 5% -35.0% 10526 ? 11% softirqs.CPU79.SCHED
16228 ? 3% -32.0% 11027 ? 8% softirqs.CPU8.SCHED
15833 ? 2% -32.8% 10636 ? 12% softirqs.CPU80.SCHED
16463 ? 3% -36.3% 10493 ? 10% softirqs.CPU81.SCHED
15911 ? 2% -32.0% 10820 ? 11% softirqs.CPU82.SCHED
16034 ? 2% -34.1% 10569 ? 10% softirqs.CPU83.SCHED
15904 -33.7% 10543 ? 12% softirqs.CPU84.SCHED
15722 -33.2% 10496 ? 12% softirqs.CPU85.SCHED
16013 ? 4% -32.9% 10740 ? 11% softirqs.CPU86.SCHED
16004 ? 2% -30.6% 11114 ? 18% softirqs.CPU87.SCHED
16698 ? 10% -33.8% 11054 ? 14% softirqs.CPU88.SCHED
16043 ? 3% -33.6% 10651 ? 11% softirqs.CPU89.SCHED
16303 ? 3% -33.5% 10835 ? 10% softirqs.CPU9.SCHED
15888 ? 2% -33.1% 10633 ? 11% softirqs.CPU90.SCHED
16115 ? 5% -34.7% 10516 ? 10% softirqs.CPU91.SCHED
16135 ? 3% -34.4% 10585 ? 10% softirqs.CPU92.SCHED
15604 ? 3% -31.2% 10735 ? 8% softirqs.CPU93.SCHED
15747 ? 2% -32.1% 10696 ? 10% softirqs.CPU94.SCHED
16121 ? 2% -35.3% 10435 ? 10% softirqs.CPU95.SCHED
1560425 -33.2% 1042262 ? 10% softirqs.SCHED
10914 ? 12% -46.2% 5872 ? 16% interrupts.CPU0.RES:Rescheduling_interrupts
10326 ? 7% -47.0% 5473 ? 17% interrupts.CPU1.RES:Rescheduling_interrupts
3165 ? 18% +57.0% 4969 ? 2% interrupts.CPU10.NMI:Non-maskable_interrupts
3165 ? 18% +57.0% 4969 ? 2% interrupts.CPU10.PMI:Performance_monitoring_interrupts
11496 ? 16% -52.6% 5449 ? 15% interrupts.CPU10.RES:Rescheduling_interrupts
10368 ? 15% -48.4% 5352 ? 17% interrupts.CPU11.RES:Rescheduling_interrupts
10655 ? 14% -48.3% 5509 ? 15% interrupts.CPU12.RES:Rescheduling_interrupts
3263 ? 22% +52.7% 4981 ? 2% interrupts.CPU13.NMI:Non-maskable_interrupts
3263 ? 22% +52.7% 4981 ? 2% interrupts.CPU13.PMI:Performance_monitoring_interrupts
10447 ? 12% -46.2% 5623 ? 21% interrupts.CPU13.RES:Rescheduling_interrupts
10601 ? 12% -49.6% 5346 ? 15% interrupts.CPU14.RES:Rescheduling_interrupts
10578 ? 15% -49.4% 5354 ? 16% interrupts.CPU15.RES:Rescheduling_interrupts
10673 ? 11% -49.7% 5366 ? 16% interrupts.CPU16.RES:Rescheduling_interrupts
2596 ? 27% +91.5% 4970 ? 2% interrupts.CPU17.NMI:Non-maskable_interrupts
2596 ? 27% +91.5% 4970 ? 2% interrupts.CPU17.PMI:Performance_monitoring_interrupts
10042 ? 16% -47.9% 5234 ? 16% interrupts.CPU17.RES:Rescheduling_interrupts
10394 ? 12% -45.6% 5651 ? 17% interrupts.CPU18.RES:Rescheduling_interrupts
9978 ? 14% -46.1% 5375 ? 15% interrupts.CPU19.RES:Rescheduling_interrupts
11767 ? 21% -53.1% 5519 ? 16% interrupts.CPU2.RES:Rescheduling_interrupts
10646 ? 14% -49.4% 5390 ? 15% interrupts.CPU20.RES:Rescheduling_interrupts
2567 ? 35% +79.0% 4595 ? 18% interrupts.CPU21.NMI:Non-maskable_interrupts
2567 ? 35% +79.0% 4595 ? 18% interrupts.CPU21.PMI:Performance_monitoring_interrupts
10407 ? 13% -48.4% 5368 ? 16% interrupts.CPU21.RES:Rescheduling_interrupts
10089 ? 14% -47.2% 5329 ? 16% interrupts.CPU22.RES:Rescheduling_interrupts
2686 ? 38% +71.3% 4602 ? 18% interrupts.CPU23.NMI:Non-maskable_interrupts
2686 ? 38% +71.3% 4602 ? 18% interrupts.CPU23.PMI:Performance_monitoring_interrupts
10006 ? 14% -48.1% 5193 ? 16% interrupts.CPU23.RES:Rescheduling_interrupts
2871 ? 34% +41.6% 4065 ? 31% interrupts.CPU24.NMI:Non-maskable_interrupts
2871 ? 34% +41.6% 4065 ? 31% interrupts.CPU24.PMI:Performance_monitoring_interrupts
12976 ? 17% -56.5% 5641 ? 14% interrupts.CPU24.RES:Rescheduling_interrupts
2869 ? 35% +63.4% 4687 ? 18% interrupts.CPU25.NMI:Non-maskable_interrupts
2869 ? 35% +63.4% 4687 ? 18% interrupts.CPU25.PMI:Performance_monitoring_interrupts
12226 ? 17% -58.1% 5123 ? 11% interrupts.CPU25.RES:Rescheduling_interrupts
2926 ? 32% +72.6% 5049 interrupts.CPU26.NMI:Non-maskable_interrupts
2926 ? 32% +72.6% 5049 interrupts.CPU26.PMI:Performance_monitoring_interrupts
11990 ? 16% -56.6% 5203 ? 12% interrupts.CPU26.RES:Rescheduling_interrupts
3184 ? 31% +59.4% 5075 interrupts.CPU27.NMI:Non-maskable_interrupts
3184 ? 31% +59.4% 5075 interrupts.CPU27.PMI:Performance_monitoring_interrupts
11858 ? 17% -56.6% 5146 ? 12% interrupts.CPU27.RES:Rescheduling_interrupts
3673 ? 21% +37.9% 5066 interrupts.CPU28.NMI:Non-maskable_interrupts
3673 ? 21% +37.9% 5066 interrupts.CPU28.PMI:Performance_monitoring_interrupts
12167 ? 18% -58.4% 5060 ? 12% interrupts.CPU28.RES:Rescheduling_interrupts
3640 ? 23% +38.9% 5058 interrupts.CPU29.NMI:Non-maskable_interrupts
3640 ? 23% +38.9% 5058 interrupts.CPU29.PMI:Performance_monitoring_interrupts
11866 ? 16% -58.9% 4873 ? 11% interrupts.CPU29.RES:Rescheduling_interrupts
10672 ? 9% -48.8% 5465 ? 17% interrupts.CPU3.RES:Rescheduling_interrupts
4128 ? 2% +22.8% 5068 interrupts.CPU30.NMI:Non-maskable_interrupts
4128 ? 2% +22.8% 5068 interrupts.CPU30.PMI:Performance_monitoring_interrupts
11897 ? 19% -58.3% 4957 ? 9% interrupts.CPU30.RES:Rescheduling_interrupts
4070 +24.3% 5058 interrupts.CPU31.NMI:Non-maskable_interrupts
4070 +24.3% 5058 interrupts.CPU31.PMI:Performance_monitoring_interrupts
11771 ? 15% -56.7% 5096 ? 10% interrupts.CPU31.RES:Rescheduling_interrupts
12028 ? 19% -57.6% 5103 ? 11% interrupts.CPU32.RES:Rescheduling_interrupts
11789 ? 16% -57.4% 5023 ? 12% interrupts.CPU33.RES:Rescheduling_interrupts
11954 ? 17% -58.2% 4998 ? 11% interrupts.CPU34.RES:Rescheduling_interrupts
11922 ? 16% -57.9% 5020 ? 11% interrupts.CPU35.RES:Rescheduling_interrupts
12005 ? 16% -58.1% 5034 ? 11% interrupts.CPU36.RES:Rescheduling_interrupts
12348 ? 14% -57.4% 5257 ? 11% interrupts.CPU37.RES:Rescheduling_interrupts
12417 ? 16% -58.7% 5129 ? 12% interrupts.CPU38.RES:Rescheduling_interrupts
12090 ? 17% -58.0% 5076 ? 11% interrupts.CPU39.RES:Rescheduling_interrupts
9627 ? 14% -44.4% 5351 ? 17% interrupts.CPU4.RES:Rescheduling_interrupts
11957 ? 18% -58.6% 4947 ? 10% interrupts.CPU40.RES:Rescheduling_interrupts
12107 ? 17% -57.9% 5091 ? 14% interrupts.CPU41.RES:Rescheduling_interrupts
12168 ? 19% -56.3% 5319 ? 11% interrupts.CPU42.RES:Rescheduling_interrupts
11956 ? 18% -57.6% 5063 ? 12% interrupts.CPU43.RES:Rescheduling_interrupts
12105 ? 17% -57.5% 5149 ? 11% interrupts.CPU44.RES:Rescheduling_interrupts
11557 ? 16% -56.2% 5064 ? 11% interrupts.CPU45.RES:Rescheduling_interrupts
12108 ? 12% -58.8% 4985 ? 12% interrupts.CPU46.RES:Rescheduling_interrupts
11660 ? 18% -56.7% 5046 ? 14% interrupts.CPU47.RES:Rescheduling_interrupts
10560 ? 12% -47.5% 5542 ? 14% interrupts.CPU48.RES:Rescheduling_interrupts
10652 ? 9% -48.6% 5474 ? 15% interrupts.CPU49.RES:Rescheduling_interrupts
10503 ? 9% -48.9% 5366 ? 16% interrupts.CPU5.RES:Rescheduling_interrupts
10515 ? 9% -48.7% 5389 ? 17% interrupts.CPU50.RES:Rescheduling_interrupts
10514 ? 8% -47.8% 5486 ? 15% interrupts.CPU51.RES:Rescheduling_interrupts
11152 ? 13% -52.2% 5336 ? 17% interrupts.CPU52.RES:Rescheduling_interrupts
10148 ? 10% -48.1% 5269 ? 18% interrupts.CPU53.RES:Rescheduling_interrupts
10387 ? 9% -49.1% 5290 ? 12% interrupts.CPU54.RES:Rescheduling_interrupts
3440 ? 4% +44.1% 4955 ? 2% interrupts.CPU55.NMI:Non-maskable_interrupts
3440 ? 4% +44.1% 4955 ? 2% interrupts.CPU55.PMI:Performance_monitoring_interrupts
10878 ? 12% -50.0% 5443 ? 16% interrupts.CPU55.RES:Rescheduling_interrupts
3694 ? 7% +34.3% 4960 ? 2% interrupts.CPU56.NMI:Non-maskable_interrupts
3694 ? 7% +34.3% 4960 ? 2% interrupts.CPU56.PMI:Performance_monitoring_interrupts
10650 ? 8% -49.2% 5405 ? 17% interrupts.CPU56.RES:Rescheduling_interrupts
3609 ? 4% +37.4% 4957 ? 2% interrupts.CPU57.NMI:Non-maskable_interrupts
3609 ? 4% +37.4% 4957 ? 2% interrupts.CPU57.PMI:Performance_monitoring_interrupts
10341 ? 12% -48.7% 5304 ? 18% interrupts.CPU57.RES:Rescheduling_interrupts
3224 ? 25% +54.1% 4967 ? 2% interrupts.CPU58.NMI:Non-maskable_interrupts
3224 ? 25% +54.1% 4967 ? 2% interrupts.CPU58.PMI:Performance_monitoring_interrupts
11137 ? 11% -51.5% 5397 ? 17% interrupts.CPU58.RES:Rescheduling_interrupts
10332 ? 13% -49.5% 5216 ? 17% interrupts.CPU59.RES:Rescheduling_interrupts
10312 ? 14% -48.3% 5329 ? 18% interrupts.CPU6.RES:Rescheduling_interrupts
11594 ? 26% -53.3% 5409 ? 16% interrupts.CPU60.RES:Rescheduling_interrupts
11154 ? 15% -50.6% 5505 ? 16% interrupts.CPU61.RES:Rescheduling_interrupts
10692 ? 12% -48.1% 5546 ? 18% interrupts.CPU62.RES:Rescheduling_interrupts
10114 ? 11% -47.3% 5333 ? 16% interrupts.CPU63.RES:Rescheduling_interrupts
10960 ? 11% -51.5% 5316 ? 16% interrupts.CPU64.RES:Rescheduling_interrupts
3800 ? 8% +30.6% 4965 ? 2% interrupts.CPU65.NMI:Non-maskable_interrupts
3800 ? 8% +30.6% 4965 ? 2% interrupts.CPU65.PMI:Performance_monitoring_interrupts
10451 ? 14% -49.8% 5249 ? 16% interrupts.CPU65.RES:Rescheduling_interrupts
10984 ? 10% -49.3% 5571 ? 16% interrupts.CPU66.RES:Rescheduling_interrupts
3743 ? 10% +32.6% 4965 ? 2% interrupts.CPU67.NMI:Non-maskable_interrupts
3743 ? 10% +32.6% 4965 ? 2% interrupts.CPU67.PMI:Performance_monitoring_interrupts
10871 ? 15% -49.8% 5458 ? 15% interrupts.CPU67.RES:Rescheduling_interrupts
10692 ? 11% -49.8% 5371 ? 15% interrupts.CPU68.RES:Rescheduling_interrupts
10518 ? 11% -48.2% 5453 ? 17% interrupts.CPU69.RES:Rescheduling_interrupts
10411 ? 12% -47.1% 5507 ? 16% interrupts.CPU7.RES:Rescheduling_interrupts
10435 ? 13% -48.8% 5345 ? 16% interrupts.CPU70.RES:Rescheduling_interrupts
10532 ? 14% -50.1% 5254 ? 18% interrupts.CPU71.RES:Rescheduling_interrupts
12972 ? 17% -57.0% 5582 ? 13% interrupts.CPU72.RES:Rescheduling_interrupts
4086 ? 2% +23.5% 5049 ? 2% interrupts.CPU73.NMI:Non-maskable_interrupts
4086 ? 2% +23.5% 5049 ? 2% interrupts.CPU73.PMI:Performance_monitoring_interrupts
12049 ? 14% -56.2% 5282 ? 11% interrupts.CPU73.RES:Rescheduling_interrupts
13200 ? 28% -60.5% 5216 ? 12% interrupts.CPU74.RES:Rescheduling_interrupts
3964 ? 3% +28.0% 5075 ? 2% interrupts.CPU75.NMI:Non-maskable_interrupts
3964 ? 3% +28.0% 5075 ? 2% interrupts.CPU75.PMI:Performance_monitoring_interrupts
11601 ? 15% -55.4% 5175 ? 12% interrupts.CPU75.RES:Rescheduling_interrupts
4088 ? 2% +23.9% 5066 interrupts.CPU76.NMI:Non-maskable_interrupts
4088 ? 2% +23.9% 5066 interrupts.CPU76.PMI:Performance_monitoring_interrupts
12116 ? 16% -58.1% 5071 ? 12% interrupts.CPU76.RES:Rescheduling_interrupts
4027 ? 4% +25.7% 5064 interrupts.CPU77.NMI:Non-maskable_interrupts
4027 ? 4% +25.7% 5064 interrupts.CPU77.PMI:Performance_monitoring_interrupts
11861 ? 17% -58.5% 4926 ? 12% interrupts.CPU77.RES:Rescheduling_interrupts
4129 ? 2% +22.9% 5074 interrupts.CPU78.NMI:Non-maskable_interrupts
4129 ? 2% +22.9% 5074 interrupts.CPU78.PMI:Performance_monitoring_interrupts
11823 ? 19% -57.8% 4994 ? 12% interrupts.CPU78.RES:Rescheduling_interrupts
4072 +24.2% 5059 interrupts.CPU79.NMI:Non-maskable_interrupts
4072 +24.2% 5059 interrupts.CPU79.PMI:Performance_monitoring_interrupts
11875 ? 18% -56.8% 5132 ? 11% interrupts.CPU79.RES:Rescheduling_interrupts
3286 ? 19% +50.8% 4956 ? 2% interrupts.CPU8.NMI:Non-maskable_interrupts
3286 ? 19% +50.8% 4956 ? 2% interrupts.CPU8.PMI:Performance_monitoring_interrupts
10577 ? 10% -48.9% 5400 ? 16% interrupts.CPU8.RES:Rescheduling_interrupts
4076 +23.9% 5050 interrupts.CPU80.NMI:Non-maskable_interrupts
4076 +23.9% 5050 interrupts.CPU80.PMI:Performance_monitoring_interrupts
11729 ? 18% -56.2% 5140 ? 11% interrupts.CPU80.RES:Rescheduling_interrupts
4101 ? 3% +23.1% 5046 ? 2% interrupts.CPU81.NMI:Non-maskable_interrupts
4101 ? 3% +23.1% 5046 ? 2% interrupts.CPU81.PMI:Performance_monitoring_interrupts
11689 ? 17% -55.8% 5167 ? 11% interrupts.CPU81.RES:Rescheduling_interrupts
4064 ? 2% +25.0% 5078 ? 2% interrupts.CPU82.NMI:Non-maskable_interrupts
4064 ? 2% +25.0% 5078 ? 2% interrupts.CPU82.PMI:Performance_monitoring_interrupts
11891 ? 18% -57.5% 5058 ? 12% interrupts.CPU82.RES:Rescheduling_interrupts
4090 ? 2% +23.9% 5069 interrupts.CPU83.NMI:Non-maskable_interrupts
4090 ? 2% +23.9% 5069 interrupts.CPU83.PMI:Performance_monitoring_interrupts
12084 ? 18% -59.3% 4922 ? 11% interrupts.CPU83.RES:Rescheduling_interrupts
3969 ? 3% +27.7% 5067 interrupts.CPU84.NMI:Non-maskable_interrupts
3969 ? 3% +27.7% 5067 interrupts.CPU84.PMI:Performance_monitoring_interrupts
11904 ? 17% -56.8% 5142 ? 12% interrupts.CPU84.RES:Rescheduling_interrupts
12313 ? 14% -57.0% 5293 ? 10% interrupts.CPU85.RES:Rescheduling_interrupts
12290 ? 16% -57.7% 5199 ? 11% interrupts.CPU86.RES:Rescheduling_interrupts
11551 ? 13% -56.0% 5084 ? 10% interrupts.CPU87.RES:Rescheduling_interrupts
12229 ? 17% -59.0% 5011 ? 12% interrupts.CPU88.RES:Rescheduling_interrupts
11836 ? 15% -58.6% 4904 ? 10% interrupts.CPU89.RES:Rescheduling_interrupts
10371 ? 14% -48.0% 5396 ? 16% interrupts.CPU9.RES:Rescheduling_interrupts
12005 ? 18% -56.1% 5271 ? 11% interrupts.CPU90.RES:Rescheduling_interrupts
11714 ? 14% -55.5% 5217 ? 12% interrupts.CPU91.RES:Rescheduling_interrupts
11997 ? 16% -57.8% 5063 ? 11% interrupts.CPU92.RES:Rescheduling_interrupts
12042 ? 16% -58.1% 5051 ? 13% interrupts.CPU93.RES:Rescheduling_interrupts
12016 ? 16% -58.2% 5027 ? 11% interrupts.CPU94.RES:Rescheduling_interrupts
12255 ? 17% -60.6% 4824 ? 13% interrupts.CPU95.RES:Rescheduling_interrupts
351763 ? 3% +26.9% 446394 ? 4% interrupts.NMI:Non-maskable_interrupts
351763 ? 3% +26.9% 446394 ? 4% interrupts.PMI:Performance_monitoring_interrupts
1086124 ? 13% -53.5% 504773 ? 13% interrupts.RES:Rescheduling_interrupts
1915 ? 6% +24.5% 2384 ? 8% interrupts.TLB:TLB_shootdowns
17.37 ? 6% -16.7 0.71 ? 21% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.do_nanosleep.hrtimer_nanosleep
16.97 ? 7% -16.6 0.40 ? 87% perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.do_nanosleep
30.20 ? 3% -15.8 14.37 ? 2% perf-profile.calltrace.cycles-pp.__x64_sys_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe.__nanosleep
27.12 ? 4% -15.8 11.34 ? 2% perf-profile.calltrace.cycles-pp.schedule.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
26.99 ? 4% -15.7 11.26 ? 2% perf-profile.calltrace.cycles-pp.__schedule.schedule.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep
29.22 ? 3% -15.7 13.51 ? 2% perf-profile.calltrace.cycles-pp.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe
29.58 ? 3% -15.7 13.87 ? 2% perf-profile.calltrace.cycles-pp.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64.entry_SYSCALL_64_after_hwframe.__nanosleep
33.34 ? 2% -15.2 18.10 ? 2% perf-profile.calltrace.cycles-pp.__nanosleep
31.51 ? 3% -15.2 16.30 ? 3% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__nanosleep
31.23 ? 3% -15.2 16.07 ? 3% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__nanosleep
15.22 ? 7% -15.1 0.15 ?158% perf-profile.calltrace.cycles-pp.load_balance.newidle_balance.pick_next_task_fair.__schedule.schedule
10.72 ? 8% -10.7 0.00 perf-profile.calltrace.cycles-pp.find_busiest_group.load_balance.newidle_balance.pick_next_task_fair.__schedule
10.49 ? 8% -10.5 0.00 perf-profile.calltrace.cycles-pp.update_sd_lb_stats.find_busiest_group.load_balance.newidle_balance.pick_next_task_fair
5.71 ? 6% -5.5 0.17 ?158% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
5.25 ? 6% -5.1 0.16 ?158% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle
3.43 -1.0 2.41 perf-profile.calltrace.cycles-pp.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
2.40 -0.4 1.96 perf-profile.calltrace.cycles-pp.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe.__sched_yield
2.35 -0.4 1.92 perf-profile.calltrace.cycles-pp.__schedule.schedule.__x64_sys_sched_yield.do_syscall_64.entry_SYSCALL_64_after_hwframe
1.43 -0.2 1.25 ? 2% perf-profile.calltrace.cycles-pp.hrtimer_start_range_ns.do_nanosleep.hrtimer_nanosleep.__x64_sys_nanosleep.do_syscall_64
0.42 ? 44% +0.2 0.65 ? 3% perf-profile.calltrace.cycles-pp.tick_nohz_get_sleep_length.menu_select.do_idle.cpu_startup_entry.start_secondary
0.55 ? 45% +0.4 0.92 ? 2% perf-profile.calltrace.cycles-pp.update_load_avg.dequeue_entity.dequeue_task_fair.__schedule.schedule
1.08 ? 7% +0.4 1.47 ? 4% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.do_nanosleep.hrtimer_nanosleep
0.78 ? 44% +0.4 1.18 ? 4% perf-profile.calltrace.cycles-pp.switch_mm_irqs_off.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.46 ? 45% +0.5 0.91 ? 4% perf-profile.calltrace.cycles-pp.restore_fpregs_from_fpstate.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64
0.96 ? 44% +0.5 1.45 ? 4% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.48 ? 50% +0.5 1.02 ? 7% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.66 ? 45% +0.6 1.25 ? 3% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__nanosleep
0.62 ? 44% +0.6 1.23 ? 3% perf-profile.calltrace.cycles-pp.switch_fpu_return.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe
0.33 ? 82% +0.7 0.98 ? 8% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.finish_task_switch.__schedule.schedule_idle.do_idle
0.00 +0.7 0.65 ? 13% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.finish_task_switch.__schedule.schedule
0.00 +0.7 0.67 ? 13% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.finish_task_switch.__schedule.schedule.do_nanosleep
0.35 ? 70% +0.7 1.03 ? 4% perf-profile.calltrace.cycles-pp.select_task_rq_fair.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues.hrtimer_interrupt
0.88 ? 4% +0.7 1.57 ? 40% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__nanosleep
0.22 ?122% +0.7 0.95 ? 8% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.finish_task_switch.__schedule.schedule_idle
0.00 +0.7 0.74 ? 4% perf-profile.calltrace.cycles-pp.__switch_to_asm
0.00 +0.8 0.84 ? 13% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__schedule.schedule_idle.do_idle
0.00 +0.9 0.87 ? 5% perf-profile.calltrace.cycles-pp.select_idle_sibling.select_task_rq_fair.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues
0.00 +1.0 1.01 ? 11% perf-profile.calltrace.cycles-pp._raw_spin_lock.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.25 ?100% +1.3 1.51 ? 15% perf-profile.calltrace.cycles-pp._raw_spin_lock.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues.hrtimer_interrupt
0.00 +1.3 1.28 ? 18% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues
0.00 +1.4 1.37 ? 4% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.poll_idle
0.00 +1.4 1.37 ? 4% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.poll_idle.cpuidle_enter_state
0.00 +1.4 1.41 ? 4% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.poll_idle.cpuidle_enter_state.cpuidle_enter
0.00 +1.4 1.44 ? 4% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.poll_idle.cpuidle_enter_state.cpuidle_enter.do_idle
0.11 ?200% +1.5 1.57 ? 10% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.finish_task_switch
0.11 ?200% +1.5 1.58 ? 10% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.finish_task_switch.__schedule
0.00 +1.6 1.57 ? 3% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
5.19 ? 6% +1.6 6.79 ? 4% perf-profile.calltrace.cycles-pp.dequeue_task_fair.__schedule.schedule.do_nanosleep.hrtimer_nanosleep
0.70 ? 45% +1.8 2.49 ? 4% perf-profile.calltrace.cycles-pp.update_load_avg.set_next_entity.pick_next_task_fair.__schedule.schedule_idle
1.33 ? 4% +2.0 3.35 ? 3% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule_idle.do_idle.cpu_startup_entry
0.94 ? 44% +2.1 3.07 ? 3% perf-profile.calltrace.cycles-pp.set_next_entity.pick_next_task_fair.__schedule.schedule_idle.do_idle
0.82 ? 71% +2.2 3.03 ? 7% perf-profile.calltrace.cycles-pp.update_cfs_group.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
0.58 ? 44% +2.3 2.88 ? 5% perf-profile.calltrace.cycles-pp.update_load_avg.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up
0.00 +2.4 2.43 ? 7% perf-profile.calltrace.cycles-pp.set_task_cpu.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues.hrtimer_interrupt
3.50 ? 6% +2.5 6.04 ? 4% perf-profile.calltrace.cycles-pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.do_nanosleep
1.40 ? 47% +2.6 3.99 ? 7% perf-profile.calltrace.cycles-pp.update_cfs_group.dequeue_entity.dequeue_task_fair.__schedule.schedule
4.87 ? 4% +3.6 8.48 ? 7% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues.hrtimer_interrupt
4.83 ? 4% +3.6 8.44 ? 7% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues
4.57 ? 3% +3.9 8.45 ? 2% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
4.50 ? 3% +3.9 8.39 ? 2% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
3.29 ? 4% +4.1 7.38 ? 6% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.hrtimer_wakeup
9.21 ? 3% +8.2 17.44 ? 5% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
8.51 ? 3% +8.3 16.81 ? 6% perf-profile.calltrace.cycles-pp.hrtimer_wakeup.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
8.43 ? 3% +8.3 16.74 ? 6% perf-profile.calltrace.cycles-pp.try_to_wake_up.hrtimer_wakeup.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
34.67 +9.6 44.24 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry
5.36 ? 2% +9.9 15.22 ? 4% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.intel_idle
5.40 ? 2% +9.9 15.28 ? 4% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.intel_idle.cpuidle_enter_state
5.74 +10.0 15.75 ? 4% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.intel_idle.cpuidle_enter_state.cpuidle_enter
49.86 +11.3 61.13 perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
49.43 +11.6 61.07 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary
56.80 +15.9 72.69 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
56.85 +15.9 72.74 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
56.87 +15.9 72.76 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
57.48 +16.1 73.53 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
22.29 ? 3% +22.4 44.73 perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle
16.99 ? 7% -16.4 0.57 ? 27% perf-profile.children.cycles-pp.newidle_balance
29.53 ? 3% -16.2 13.31 ? 2% perf-profile.children.cycles-pp.schedule
35.46 ? 2% -16.1 19.33 ? 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
35.10 ? 2% -16.1 19.04 ? 5% perf-profile.children.cycles-pp.do_syscall_64
30.21 ? 3% -15.8 14.38 ? 2% perf-profile.children.cycles-pp.__x64_sys_nanosleep
29.24 ? 3% -15.7 13.52 ? 2% perf-profile.children.cycles-pp.do_nanosleep
29.59 ? 3% -15.7 13.87 ? 2% perf-profile.children.cycles-pp.hrtimer_nanosleep
33.48 ? 2% -15.3 18.22 ? 2% perf-profile.children.cycles-pp.__nanosleep
15.30 ? 7% -14.9 0.42 ? 30% perf-profile.children.cycles-pp.load_balance
19.78 ? 5% -14.8 4.97 ? 3% perf-profile.children.cycles-pp.pick_next_task_fair
33.95 ? 2% -12.2 21.72 ? 2% perf-profile.children.cycles-pp.__schedule
10.77 ? 8% -10.5 0.27 ? 32% perf-profile.children.cycles-pp.find_busiest_group
10.58 ? 8% -10.3 0.26 ? 33% perf-profile.children.cycles-pp.update_sd_lb_stats
2.06 ? 20% -2.0 0.06 ? 16% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
1.49 ? 11% -1.4 0.06 ? 20% perf-profile.children.cycles-pp.idle_cpu
5.09 ? 9% -1.1 4.02 ? 9% perf-profile.children.cycles-pp._raw_spin_lock
3.44 -1.0 2.42 perf-profile.children.cycles-pp.__x64_sys_sched_yield
1.04 ? 8% -0.9 0.15 ? 15% perf-profile.children.cycles-pp.update_blocked_averages
0.79 ? 7% -0.7 0.10 ? 7% perf-profile.children.cycles-pp._find_next_bit
0.96 -0.6 0.40 ? 3% perf-profile.children.cycles-pp.do_sched_yield
0.95 ? 5% -0.4 0.50 ? 3% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
1.05 ? 2% -0.3 0.72 ? 3% perf-profile.children.cycles-pp.clockevents_program_event
1.77 -0.3 1.45 ? 2% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.91 -0.3 0.63 ? 4% perf-profile.children.cycles-pp.sched_clock_cpu
0.79 ? 2% -0.2 0.57 ? 4% perf-profile.children.cycles-pp.native_sched_clock
0.76 ? 2% -0.2 0.54 ? 2% perf-profile.children.cycles-pp.lapic_next_deadline
1.30 ? 2% -0.2 1.10 ? 3% perf-profile.children.cycles-pp.update_rq_clock
0.31 ? 5% -0.2 0.11 ? 11% perf-profile.children.cycles-pp.yield_task_fair
1.02 ? 4% -0.2 0.82 ? 4% perf-profile.children.cycles-pp.sem_getvalue@@GLIBC_2.2.5
0.48 ? 3% -0.2 0.29 ? 3% perf-profile.children.cycles-pp.reweight_entity
1.44 -0.2 1.27 ? 2% perf-profile.children.cycles-pp.hrtimer_start_range_ns
0.86 ? 7% -0.1 0.72 ? 5% perf-profile.children.cycles-pp.semaphore_posix_thrash
0.41 -0.1 0.28 ? 4% perf-profile.children.cycles-pp.irq_exit_rcu
0.41 ? 8% -0.1 0.29 ? 4% perf-profile.children.cycles-pp.migrate_task_rq_fair
0.61 ? 2% -0.1 0.49 ? 4% perf-profile.children.cycles-pp.__update_load_avg_se
0.23 ? 13% -0.1 0.11 ? 4% perf-profile.children.cycles-pp.place_entity
0.27 ? 4% -0.1 0.14 ? 3% perf-profile.children.cycles-pp.irq_enter_rcu
0.62 ? 2% -0.1 0.51 ? 3% perf-profile.children.cycles-pp.ktime_get
0.38 ? 4% -0.1 0.27 ? 2% perf-profile.children.cycles-pp.native_irq_return_iret
0.83 -0.1 0.71 ? 2% perf-profile.children.cycles-pp.load_new_mm_cr3
0.24 ? 4% -0.1 0.12 ? 3% perf-profile.children.cycles-pp.tick_irq_enter
1.59 ? 2% -0.1 1.50 ? 2% perf-profile.children.cycles-pp.update_curr
0.35 ? 2% -0.1 0.26 ? 2% perf-profile.children.cycles-pp.pick_next_entity
0.12 ? 18% -0.1 0.04 ? 63% perf-profile.children.cycles-pp.get_nohz_timer_target
0.44 ? 3% -0.1 0.36 ? 3% perf-profile.children.cycles-pp.syscall_return_via_sysret
0.20 ? 4% -0.1 0.12 ? 5% perf-profile.children.cycles-pp.put_prev_entity
0.49 ? 5% -0.1 0.41 ? 3% perf-profile.children.cycles-pp.sem_post@@GLIBC_2.2.5
0.37 ? 3% -0.1 0.30 ? 3% perf-profile.children.cycles-pp.save_fpregs_to_fpstate
0.26 ? 4% -0.1 0.19 ? 8% perf-profile.children.cycles-pp.tick_sched_timer
0.45 -0.1 0.38 ? 4% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template
0.44 ? 2% -0.1 0.38 ? 2% perf-profile.children.cycles-pp.get_timespec64
0.12 ? 6% -0.1 0.06 ? 7% perf-profile.children.cycles-pp.rebalance_domains
0.22 ? 5% -0.1 0.16 ? 4% perf-profile.children.cycles-pp.__softirqentry_text_start
0.24 ? 4% -0.1 0.17 ? 8% perf-profile.children.cycles-pp.tick_sched_handle
0.23 ? 5% -0.1 0.17 ? 9% perf-profile.children.cycles-pp.update_process_times
0.38 -0.1 0.32 ? 2% perf-profile.children.cycles-pp._copy_from_user
0.18 ? 9% -0.1 0.13 ? 5% perf-profile.children.cycles-pp.rb_erase
0.27 ? 4% -0.1 0.22 ? 4% perf-profile.children.cycles-pp.__calc_delta
0.18 ? 2% -0.0 0.13 ? 2% perf-profile.children.cycles-pp.__might_fault
0.44 ? 4% -0.0 0.40 ? 4% perf-profile.children.cycles-pp.__entry_text_start
0.09 ? 4% -0.0 0.05 ? 9% perf-profile.children.cycles-pp.__list_add_valid
0.13 ? 4% -0.0 0.09 ? 13% perf-profile.children.cycles-pp.scheduler_tick
0.10 ? 3% -0.0 0.07 ? 7% perf-profile.children.cycles-pp.ktime_get_update_offsets_now
0.07 ? 10% -0.0 0.04 ? 63% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare
0.25 ? 3% -0.0 0.22 ? 5% perf-profile.children.cycles-pp.perf_tp_event
0.13 ? 3% -0.0 0.09 ? 7% perf-profile.children.cycles-pp.pick_next_task_idle
0.33 ? 2% -0.0 0.30 ? 4% perf-profile.children.cycles-pp.sem_trywait@@GLIBC_2.2.5
0.10 ? 3% -0.0 0.07 ? 9% perf-profile.children.cycles-pp.__enqueue_entity
0.12 ? 6% -0.0 0.10 ? 4% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime
0.11 ? 6% -0.0 0.09 ? 8% perf-profile.children.cycles-pp.set_next_task_idle
0.10 ? 5% -0.0 0.07 ? 6% perf-profile.children.cycles-pp.perf_trace_buf_update
0.23 ? 2% -0.0 0.21 ? 3% perf-profile.children.cycles-pp.update_min_vruntime
0.12 ? 4% -0.0 0.10 ? 6% perf-profile.children.cycles-pp.__cgroup_account_cputime
0.10 ? 6% +0.0 0.12 ? 4% perf-profile.children.cycles-pp.syscall_enter_from_user_mode
0.06 ? 6% +0.0 0.08 ? 9% perf-profile.children.cycles-pp.hrtimer_reprogram
0.10 ? 5% +0.0 0.12 ? 7% perf-profile.children.cycles-pp.hrtimer_get_next_event
0.12 ? 6% +0.0 0.16 ? 4% perf-profile.children.cycles-pp.hrtimer_next_event_without
0.13 ? 8% +0.0 0.16 ? 3% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.30 ? 4% +0.0 0.33 ? 5% perf-profile.children.cycles-pp.enqueue_hrtimer
0.10 ? 44% +0.0 0.14 ? 5% perf-profile.children.cycles-pp.cpuidle_governor_latency_req
0.26 ? 2% +0.0 0.30 ? 5% perf-profile.children.cycles-pp.timerqueue_add
0.08 ? 44% +0.0 0.12 ? 6% perf-profile.children.cycles-pp.rcu_eqs_exit
0.06 ? 46% +0.0 0.10 ? 6% perf-profile.children.cycles-pp.call_cpuidle
0.03 ? 82% +0.0 0.07 ? 9% perf-profile.children.cycles-pp.cpumask_next
0.02 ? 99% +0.0 0.07 ? 10% perf-profile.children.cycles-pp.rcu_needs_cpu
0.13 ? 3% +0.0 0.18 ? 3% perf-profile.children.cycles-pp.rcu_idle_exit
0.12 ? 3% +0.0 0.17 ? 5% perf-profile.children.cycles-pp.rcu_dynticks_inc
0.01 ?223% +0.1 0.06 ? 10% perf-profile.children.cycles-pp.menu_reflect
0.17 ? 2% +0.1 0.22 ? 3% perf-profile.children.cycles-pp.tick_nohz_idle_enter
0.19 ? 2% +0.1 0.26 ? 2% perf-profile.children.cycles-pp.get_next_timer_interrupt
0.14 ? 5% +0.1 0.24 ? 4% perf-profile.children.cycles-pp.update_ts_time_stats
0.35 ? 2% +0.1 0.45 ? 4% perf-profile.children.cycles-pp.tick_nohz_next_event
0.13 ? 3% +0.1 0.24 ? 3% perf-profile.children.cycles-pp.nr_iowait_cpu
0.07 ? 5% +0.1 0.17 ? 2% perf-profile.children.cycles-pp.attach_entity_load_avg
0.08 ? 9% +0.1 0.19 ? 8% perf-profile.children.cycles-pp.remove_entity_load_avg
0.08 ? 5% +0.1 0.22 ? 5% perf-profile.children.cycles-pp.cpus_share_cache
0.29 ? 3% +0.1 0.43 perf-profile.children.cycles-pp.check_preempt_curr
0.39 ? 6% +0.1 0.54 ? 5% perf-profile.children.cycles-pp.available_idle_cpu
0.51 +0.1 0.66 ? 3% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length
0.26 ? 14% +0.2 0.42 ? 9% perf-profile.children.cycles-pp.shim_nanosleep_uint64
0.17 ? 4% +0.2 0.35 ? 3% perf-profile.children.cycles-pp.resched_curr
0.28 ? 3% +0.2 0.46 ? 2% perf-profile.children.cycles-pp.ttwu_do_wakeup
0.15 ? 4% +0.2 0.36 ? 4% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.67 ? 2% +0.2 0.89 ? 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
0.97 ? 3% +0.2 1.21 ? 2% perf-profile.children.cycles-pp.__switch_to_asm
0.15 ? 3% +0.2 0.39 ? 3% perf-profile.children.cycles-pp.hrtimer_try_to_cancel
0.13 ? 3% +0.2 0.37 ? 3% perf-profile.children.cycles-pp.hrtimer_active
0.41 ? 9% +0.3 0.68 ? 10% perf-profile.children.cycles-pp.select_idle_cpu
0.85 ? 3% +0.3 1.14 ? 2% perf-profile.children.cycles-pp.__switch_to
1.17 ? 2% +0.3 1.48 ? 4% perf-profile.children.cycles-pp.menu_select
0.63 ? 5% +0.3 0.94 ? 3% perf-profile.children.cycles-pp.restore_fpregs_from_fpstate
1.04 ? 5% +0.4 1.40 ? 3% perf-profile.children.cycles-pp.select_task_rq_fair
0.95 ? 4% +0.4 1.33 ? 3% perf-profile.children.cycles-pp.exit_to_user_mode_prepare
0.87 ? 4% +0.4 1.27 ? 3% perf-profile.children.cycles-pp.switch_fpu_return
0.78 ? 5% +0.4 1.20 ? 4% perf-profile.children.cycles-pp.select_idle_sibling
1.09 ? 4% +0.8 1.93 ? 65% perf-profile.children.cycles-pp.syscall_exit_to_user_mode
1.75 ? 10% +0.9 2.64 ? 5% perf-profile.children.cycles-pp.finish_task_switch
5.67 ? 6% +1.2 6.82 ? 4% perf-profile.children.cycles-pp.dequeue_task_fair
0.41 ? 12% +1.2 1.62 ? 3% perf-profile.children.cycles-pp.poll_idle
1.40 ? 3% +1.8 3.20 ? 3% perf-profile.children.cycles-pp.set_next_entity
1.06 ? 10% +2.2 3.30 ? 7% perf-profile.children.cycles-pp.set_task_cpu
3.70 ? 6% +2.4 6.08 ? 4% perf-profile.children.cycles-pp.dequeue_entity
4.07 ? 4% +3.5 7.59 ? 4% perf-profile.children.cycles-pp.update_load_avg
5.19 ? 14% +3.8 8.97 ? 8% perf-profile.children.cycles-pp.update_cfs_group
6.12 ? 6% +3.9 10.00 ? 5% perf-profile.children.cycles-pp.enqueue_task_fair
4.62 ? 3% +3.9 8.55 ? 2% perf-profile.children.cycles-pp.schedule_idle
4.42 ? 6% +4.4 8.83 ? 5% perf-profile.children.cycles-pp.enqueue_entity
5.56 ? 5% +4.4 10.01 ? 5% perf-profile.children.cycles-pp.ttwu_do_activate
12.73 ? 4% +7.9 20.66 ? 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
11.85 ? 4% +8.2 20.08 ? 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
11.75 ? 4% +8.2 19.99 ? 4% perf-profile.children.cycles-pp.hrtimer_interrupt
10.74 ? 4% +8.5 19.27 ? 5% perf-profile.children.cycles-pp.__hrtimer_run_queues
9.89 ? 4% +8.7 18.55 ? 5% perf-profile.children.cycles-pp.hrtimer_wakeup
9.83 ? 4% +8.7 18.51 ? 5% perf-profile.children.cycles-pp.try_to_wake_up
50.40 +11.4 61.77 perf-profile.children.cycles-pp.cpuidle_enter
50.37 +11.4 61.75 perf-profile.children.cycles-pp.cpuidle_enter_state
21.64 +13.9 35.50 ? 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
43.36 +15.7 59.08 perf-profile.children.cycles-pp.intel_idle
56.87 +15.9 72.76 perf-profile.children.cycles-pp.start_secondary
57.43 +16.1 73.48 perf-profile.children.cycles-pp.do_idle
57.48 +16.1 73.53 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
57.48 +16.1 73.53 perf-profile.children.cycles-pp.cpu_startup_entry
8.27 ? 8% -8.1 0.21 ? 33% perf-profile.self.cycles-pp.update_sd_lb_stats
1.47 ? 11% -1.4 0.06 ? 20% perf-profile.self.cycles-pp.idle_cpu
0.73 ? 7% -0.6 0.09 ? 9% perf-profile.self.cycles-pp._find_next_bit
2.02 ? 5% -0.6 1.38 ? 2% perf-profile.self.cycles-pp._raw_spin_lock
0.63 ? 4% -0.5 0.17 ? 4% perf-profile.self.cycles-pp.cpuidle_enter_state
0.87 ? 5% -0.4 0.48 ? 3% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.40 ? 25% -0.3 0.06 ? 18% perf-profile.self.cycles-pp.update_blocked_averages
0.40 ? 13% -0.3 0.11 ? 4% perf-profile.self.cycles-pp.newidle_balance
0.90 ? 4% -0.2 0.67 ? 4% perf-profile.self.cycles-pp.sem_getvalue@@GLIBC_2.2.5
0.76 ? 2% -0.2 0.54 ? 2% perf-profile.self.cycles-pp.lapic_next_deadline
0.76 ? 2% -0.2 0.55 ? 4% perf-profile.self.cycles-pp.native_sched_clock
0.40 ? 12% -0.2 0.19 ? 2% perf-profile.self.cycles-pp.dequeue_task_fair
0.93 ? 2% -0.2 0.73 ? 3% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.81 ? 7% -0.2 0.64 ? 6% perf-profile.self.cycles-pp.semaphore_posix_thrash
0.60 ? 13% -0.2 0.44 ? 4% perf-profile.self.cycles-pp.enqueue_task_fair
0.40 ? 3% -0.1 0.27 ? 4% perf-profile.self.cycles-pp.reweight_entity
0.60 ? 2% -0.1 0.48 ? 4% perf-profile.self.cycles-pp.__update_load_avg_se
0.83 -0.1 0.71 ? 2% perf-profile.self.cycles-pp.load_new_mm_cr3
0.38 ? 3% -0.1 0.27 ? 2% perf-profile.self.cycles-pp.native_irq_return_iret
0.45 ? 5% -0.1 0.36 ? 3% perf-profile.self.cycles-pp.sem_post@@GLIBC_2.2.5
0.17 ? 13% -0.1 0.09 ? 7% perf-profile.self.cycles-pp.place_entity
0.43 ? 4% -0.1 0.35 ? 3% perf-profile.self.cycles-pp.syscall_return_via_sysret
0.33 ? 3% -0.1 0.26 ? 5% perf-profile.self.cycles-pp.pick_next_task_fair
0.37 ? 2% -0.1 0.29 ? 3% perf-profile.self.cycles-pp.save_fpregs_to_fpstate
0.37 ? 2% -0.1 0.30 ? 6% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.56 ? 3% -0.1 0.49 ? 2% perf-profile.self.cycles-pp.enqueue_entity
0.30 -0.1 0.24 ? 3% perf-profile.self.cycles-pp.pick_next_entity
0.19 ? 7% -0.1 0.12 ? 16% perf-profile.self.cycles-pp.do_syscall_64
0.18 ? 5% -0.1 0.12 ? 5% perf-profile.self.cycles-pp.__x64_sys_nanosleep
0.15 ? 3% -0.1 0.09 ? 5% perf-profile.self.cycles-pp.schedule
0.26 ? 6% -0.1 0.21 ? 3% perf-profile.self.cycles-pp.select_task_rq_fair
0.26 ? 2% -0.1 0.20 ? 5% perf-profile.self.cycles-pp.ktime_get
0.09 ? 9% -0.1 0.04 ? 63% perf-profile.self.cycles-pp.__list_add_valid
0.10 ? 4% -0.0 0.06 ? 8% perf-profile.self.cycles-pp.sched_clock_cpu
0.26 ? 3% -0.0 0.22 ? 4% perf-profile.self.cycles-pp.__calc_delta
0.17 ? 9% -0.0 0.12 ? 5% perf-profile.self.cycles-pp.rb_erase
0.20 ? 3% -0.0 0.16 ? 4% perf-profile.self.cycles-pp.__sched_yield
0.30 ? 2% -0.0 0.26 ? 6% perf-profile.self.cycles-pp.sem_trywait@@GLIBC_2.2.5
0.09 ? 7% -0.0 0.05 ? 6% perf-profile.self.cycles-pp.exit_to_user_mode_prepare
0.38 ? 3% -0.0 0.35 ? 3% perf-profile.self.cycles-pp.do_nanosleep
0.12 ? 6% -0.0 0.10 ? 4% perf-profile.self.cycles-pp.perf_trace_sched_stat_runtime
0.09 ? 4% -0.0 0.07 ? 10% perf-profile.self.cycles-pp.__enqueue_entity
0.11 ? 7% -0.0 0.08 ? 8% perf-profile.self.cycles-pp.__hrtimer_run_queues
0.26 -0.0 0.24 ? 3% perf-profile.self.cycles-pp.try_to_wake_up
0.11 ? 8% -0.0 0.09 ? 3% perf-profile.self.cycles-pp.hrtimer_interrupt
0.22 ? 2% -0.0 0.20 ? 4% perf-profile.self.cycles-pp.update_min_vruntime
0.09 ? 5% -0.0 0.08 ? 4% perf-profile.self.cycles-pp._raw_spin_unlock_irqrestore
0.06 ? 45% +0.0 0.09 ? 5% perf-profile.self.cycles-pp.perf_trace_sched_wakeup_template
0.05 ? 45% +0.0 0.08 ? 10% perf-profile.self.cycles-pp.hrtimer_reprogram
0.10 ? 5% +0.0 0.13 ? 3% perf-profile.self.cycles-pp.select_idle_sibling
0.06 ? 46% +0.0 0.10 ? 6% perf-profile.self.cycles-pp.call_cpuidle
0.02 ? 99% +0.0 0.07 ? 11% perf-profile.self.cycles-pp.rcu_needs_cpu
0.11 ? 7% +0.0 0.16 ? 4% perf-profile.self.cycles-pp.rcu_dynticks_inc
0.00 +0.1 0.05 perf-profile.self.cycles-pp.tick_nohz_idle_exit
0.00 +0.1 0.05 ? 9% perf-profile.self.cycles-pp.rcu_idle_exit
0.00 +0.1 0.07 ? 4% perf-profile.self.cycles-pp.migrate_task_rq_fair
0.08 ? 22% +0.1 0.17 ? 12% perf-profile.self.cycles-pp.poll_idle
0.23 ? 2% +0.1 0.32 ? 2% perf-profile.self.cycles-pp.switch_fpu_return
0.13 ? 5% +0.1 0.23 ? 3% perf-profile.self.cycles-pp.nr_iowait_cpu
0.07 ? 12% +0.1 0.17 ? 2% perf-profile.self.cycles-pp.attach_entity_load_avg
0.42 +0.1 0.54 ? 3% perf-profile.self.cycles-pp.do_idle
0.25 ? 15% +0.1 0.38 ? 9% perf-profile.self.cycles-pp.shim_nanosleep_uint64
0.49 ? 4% +0.1 0.63 ? 5% perf-profile.self.cycles-pp.menu_select
0.08 ? 9% +0.1 0.22 ? 5% perf-profile.self.cycles-pp.cpus_share_cache
0.39 ? 6% +0.1 0.53 ? 5% perf-profile.self.cycles-pp.available_idle_cpu
0.16 ? 2% +0.2 0.34 ? 3% perf-profile.self.cycles-pp.resched_curr
0.66 ? 2% +0.2 0.88 ? 2% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.12 ? 4% +0.2 0.34 ? 3% perf-profile.self.cycles-pp.hrtimer_active
0.97 ? 3% +0.2 1.20 ? 2% perf-profile.self.cycles-pp.__switch_to_asm
0.27 ? 4% +0.2 0.52 ? 4% perf-profile.self.cycles-pp.set_next_entity
0.83 ? 2% +0.3 1.13 ? 3% perf-profile.self.cycles-pp.__switch_to
0.63 ? 5% +0.3 0.94 ? 3% perf-profile.self.cycles-pp.restore_fpregs_from_fpstate
0.63 ? 13% +2.3 2.98 ? 8% perf-profile.self.cycles-pp.set_task_cpu
2.50 ? 7% +3.5 5.97 ? 5% perf-profile.self.cycles-pp.update_load_avg
5.17 ? 14% +3.8 8.95 ? 8% perf-profile.self.cycles-pp.update_cfs_group
37.34 ? 2% +5.5 42.82 ? 2% perf-profile.self.cycles-pp.intel_idle



stress-ng.time.voluntary_context_switches

3e+08 +-----------------------------------------------------------------+
| |
2.5e+08 |-OO O O O O O OO OO OO O OO OO O OO O O O OO OO |
| O O +. .+.++. +.O .+ .+ .|
|.++.+.++.+.++.+.+ ++ +.+ + + + +.+.++ +.+.+ +.++ |
2e+08 |-+ : :: : : : |
| : :: : :: |
1.5e+08 |-+ + : + : + |
| : : : : |
1e+08 |-+ : : : : |
| : : : : |
| :: :: |
5e+07 |-+ : : |
| : : |
0 +-----------------------------------------------------------------+


stress-ng.time.involuntary_context_switches

6e+07 +-------------------------------------------------------------------+
| |
5e+07 |-+ .+ .+ + |
|.+ .+. +. .+ .+.++. .+ : + : .+. :+ .++. .|
| +.+ + +.++.+ +.+ +.++ : : : + ++.+.: +.+ +.++ |
4e+07 |-+ : : : : + |
| : : : : |
3e+07 |-+ O : : : : |
| O O O OO O O O OO O OO O O::O O: : O OO O O O |
2e+07 |-OO O OO :: :O: O |
| :: :: |
| :: : |
1e+07 |-+ : : |
| : : |
0 +-------------------------------------------------------------------+


stress-ng.sem.ops

6e+08 +-------------------------------------------------------------------+
| |
5e+08 |-OO O O O O O O OO O OO O OO OO O O OO OO O OO O |
| O O .+ .+.++. +.O .+. +. .|
|.++.+.+.++.+.++.+ +.+ +.+ + + + ++.+.+ +.+.+ +.++ |
4e+08 |-+ : :: : : : |
| : :: : :: |
3e+08 |-+ + : + : + |
| : : : : |
2e+08 |-+ : : : : |
| : : :: |
| :: :: |
1e+08 |-+ : :: |
| : : |
0 +-------------------------------------------------------------------+


stress-ng.sem.ops_per_sec

9e+06 +-------------------------------------------------------------------+
| OO O O O O OO O O O O OO O OO O OO O O O O OO O O O |
8e+06 |.++. .+.++. .++.+.++.+.+.++.+. +.+ + +.+. +. +.+.+.+ +.++.|
7e+06 |-+ + + + : :: : + +.+ : : |
| : :: : :: |
6e+06 |-+ : : : : :: |
5e+06 |-+ + : + : + |
| : : : : |
4e+06 |-+ : : : : |
3e+06 |-+ : : :: |
| :: :: |
2e+06 |-+ :: :: |
1e+06 |-+ : : |
| : : |
0 +-------------------------------------------------------------------+


[*] bisect-good sample
[O] bisect-bad sample



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation

Thanks,
Oliver Sang


Attachments:
(No filename) (71.06 kB)
config-5.16.0-rc1-00009-g8d0920b981b6 (169.42 kB)
job-script (8.30 kB)
job.yaml (5.44 kB)
reproduce (379.00 B)
Download all attachments

2021-11-29 16:56:37

by Vincent Donnefort

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

[...]

> > > >
> > > > still i don't see the need of !is_idle_task(current)
> > > >
> > >
> > > Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> > > which can lead to coscheduling when the wakeup is issued by the idle task
> > > (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> > > having sent an IPI due to polling). Essentially this overrides the first
> > > check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> > > target).
> > >
> > > I couldn't prove such wakeups can happen right now, but if/when they do
> > > (AIUI it would just take someone to add a wake_up_process() down some
> > > smp_call_function() callback) then we'll need the above. If you're still
> > > not convinced by now, I won't push it further.
> >
> > From a quick experiment, even with the asym_fits_capacity(), I can trigger
> > the following:
> >
> > [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>
> Timestamp shows its booting phase and thread name above shows per cpu
> thread. Could it happen just while creating per cpu thread at boot and
> as a result not relevant ?

I have more of those logs a bit later in the boot:

[ 0.484791] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.516495] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.525758] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.535078] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.547486] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
[ 0.579192] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1

The nr_cpus_allowed=8 suggest that none of the threads from the logs I
shared are per-CPU. Sorry if the format is confusing, I used:

wakee=<comm>:<pid> current=<comm>:<pid>.

>
> Can you see similar things later after booting ?

I tried few scenarios other than the boot time but none of them produced
"current=swapper/X:1 in_task=1"

>
> I have tried to trigger the situation but failed to get wrong
> sequence. All are coming from interrupt while idle.
> After adding in_task() condition, I haven't been able to trigger the
> warn() that I added to catch the wrong situations on SMP, Heterogenous
> or NUMA system. Could you share more details on your setup ?
>

This is just my Hikey960 with the asym_fits_capacity() fix [1] to make sure I
don't simply hit the other issue with asym platforms.

Then I just added my log in the per-CPU kthread wakee stacking exit path

printk("%s: wakee=%s:%d nr_cpus_allowed=%d current=%s:%d in_task=%d\n",
__func__, p->comm, p->pid, p->nr_cpus_allowed, current->comm, current->pid, in_task());


[1] https://lore.kernel.org/all/[email protected]/


From the same logs I also see:

wakee=xfsaild/mmcblk0:4855 nr_cpus_allowed=8 current=kworker/1:1:1070 in_task=0

Doesn't that look like a genuine wakeup that would escape the per-CPU kthread
stacking exit path because of the in_task test?

2021-11-29 19:31:42

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Fri, 26 Nov 2021 at 18:18, Vincent Donnefort
<[email protected]> wrote:
>
> On Fri, Nov 26, 2021 at 04:49:12PM +0000, Valentin Schneider wrote:
> > On 26/11/21 15:40, Vincent Guittot wrote:
> > > On Fri, 26 Nov 2021 at 14:32, Valentin Schneider
> > > <[email protected]> wrote:
> > >> /*
> > >> - * Allow a per-cpu kthread to stack with the wakee if the
> > >> - * kworker thread and the tasks previous CPUs are the same.
> > >> - * The assumption is that the wakee queued work for the
> > >> - * per-cpu kthread that is now complete and the wakeup is
> > >> - * essentially a sync wakeup. An obvious example of this
> > >> + * Allow a per-cpu kthread to stack with the wakee if the kworker thread
> > >> + * and the tasks previous CPUs are the same. The assumption is that the
> > >> + * wakee queued work for the per-cpu kthread that is now complete and
> > >> + * the wakeup is essentially a sync wakeup. An obvious example of this
> > >> * pattern is IO completions.
> > >> + *
> > >> + * Ensure the wakeup is issued by the kthread itself, and don't match
> > >> + * against the idle task because that could override the
> > >> + * available_idle_cpu(target) check done higher up.
> > >> */
> > >> - if (is_per_cpu_kthread(current) &&
> > >> + if (is_per_cpu_kthread(current) && !is_idle_task(current) &&
> > >
> > > still i don't see the need of !is_idle_task(current)
> > >
> >
> > Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> > which can lead to coscheduling when the wakeup is issued by the idle task
> > (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> > having sent an IPI due to polling). Essentially this overrides the first
> > check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> > target).
> >
> > I couldn't prove such wakeups can happen right now, but if/when they do
> > (AIUI it would just take someone to add a wake_up_process() down some
> > smp_call_function() callback) then we'll need the above. If you're still
> > not convinced by now, I won't push it further.
>
> From a quick experiment, even with the asym_fits_capacity(), I can trigger
> the following:
>
> [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1

Timestamp shows its booting phase and thread name above shows per cpu
thread. Could it happen just while creating per cpu thread at boot and
as a result not relevant ?

Can you see similar things later after booting ?

I have tried to trigger the situation but failed to get wrong
sequence. All are coming from interrupt while idle.
After adding in_task() condition, I haven't been able to trigger the
warn() that I added to catch the wrong situations on SMP, Heterogenous
or NUMA system. Could you share more details on your setup ?


> [ 0.171943] select_idle_sibling: wakee=rcu_sched:10 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>
> So the in_task() condition doesn't appear to be enough to filter wakeups
> while we have the swapper as a current.
>
> >
> > >
> > >> + in_task() &&
> > >> prev == smp_processor_id() &&
> > >> this_rq()->nr_running <= 1) {
> > >> return prev;
> > >>

2021-11-30 13:36:00

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On 29.11.21 17:54, Vincent Donnefort wrote:
> [...]
>
>>>>>
>>>>> still i don't see the need of !is_idle_task(current)
>>>>>
>>>>
>>>> Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
>>>> which can lead to coscheduling when the wakeup is issued by the idle task
>>>> (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
>>>> having sent an IPI due to polling). Essentially this overrides the first
>>>> check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
>>>> target).
>>>>
>>>> I couldn't prove such wakeups can happen right now, but if/when they do
>>>> (AIUI it would just take someone to add a wake_up_process() down some
>>>> smp_call_function() callback) then we'll need the above. If you're still
>>>> not convinced by now, I won't push it further.
>>>
>>> From a quick experiment, even with the asym_fits_capacity(), I can trigger
>>> the following:
>>>
>>> [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>>> [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>>> [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>>> [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>>> [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>>
>> Timestamp shows its booting phase and thread name above shows per cpu
>> thread. Could it happen just while creating per cpu thread at boot and
>> as a result not relevant ?
>
> I have more of those logs a bit later in the boot:
>
> [ 0.484791] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.516495] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.525758] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.535078] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.547486] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.579192] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>
> The nr_cpus_allowed=8 suggest that none of the threads from the logs I
> shared are per-CPU. Sorry if the format is confusing, I used:
>
> wakee=<comm>:<pid> current=<comm>:<pid>.
>
>>
>> Can you see similar things later after booting ?
>
> I tried few scenarios other than the boot time but none of them produced
> "current=swapper/X:1 in_task=1"

I don't see them on hikey620 (SMP), not even during boot. I use a
BUG_ON(is_idle_task(current) && in_task()) in sis()'
`is_per_cpu_kthread` condition.

I can only spot `is_idle_task(current)=1` (1) or `in_task()=1` (2):

<idle>-0 [006] dNh3. 274.137473: select_task_rq_fair: (1):
is_idle_task(current)=1 in_task()=0 this=6 prev=6 target=6
rq->nr_running=1 p=[task_n10-1 1158] p->cpus_ptr=0-7 current=[swapper/6 0]

[ 104.463685] CPU: 4 PID: 0 Comm: swapper/4 Not tainted
5.16.0-rc1-00008-g8c92606ab810-dirty #78
[ 104.472385] Hardware name: HiKey Development Board (DT)
[ 104.477627] Call trace:

[ 104.490808] dump_stack+0x1c/0x38
[ 104.494146] select_task_rq_fair+0x1200/0x120c
[ 104.498620] try_to_wake_up+0x168/0x670
[ 104.502486] wake_up_process+0x1c/0x30
[ 104.506260] hrtimer_wakeup+0x24/0x3c
[ 104.509948] __hrtimer_run_queues+0x184/0x36c
[ 104.514330] hrtimer_interrupt+0xec/0x250
[ 104.518365] tick_receive_broadcast+0x30/0x50
[ 104.522751] ipi_handler+0x1dc/0x350


kworker/3:2-87 [003] d..3. 270.954929: select_task_rq_fair: (2):
is_idle_task(current)=0 in_task()=1 this=3 prev=3 target=3
rq->nr_running=1 p=[kworker/u16:1 74] p->cpus_ptr=0-7
current=[kworker/3:2 87]

>
>>
>> I have tried to trigger the situation but failed to get wrong
>> sequence. All are coming from interrupt while idle.
>> After adding in_task() condition, I haven't been able to trigger the
>> warn() that I added to catch the wrong situations on SMP, Heterogenous
>> or NUMA system. Could you share more details on your setup ?
>>
>
> This is just my Hikey960 with the asym_fits_capacity() fix [1] to make sure I
> don't simply hit the other issue with asym platforms.
>
> Then I just added my log in the per-CPU kthread wakee stacking exit path
>
> printk("%s: wakee=%s:%d nr_cpus_allowed=%d current=%s:%d in_task=%d\n",
> __func__, p->comm, p->pid, p->nr_cpus_allowed, current->comm, current->pid, in_task());
>
>
> [1] https://lore.kernel.org/all/[email protected]/
>
>
> From the same logs I also see:
>
> wakee=xfsaild/mmcblk0:4855 nr_cpus_allowed=8 current=kworker/1:1:1070 in_task=0
>
> Doesn't that look like a genuine wakeup that would escape the per-CPU kthread
> stacking exit path because of the in_task test?

I get a couple of `is_idle_task(current)=0 && in_task()=0` mostly with
`current=ksoftirqd/X` and occasionally with `current=[kworker/X:1H` or
`current=kworker/X:1`.

ksoftirqd/7-46 [007] d.s4. 330.275122: select_task_rq_fair: (3):
is_idle_task(current)=0 in_task()=0 this=7 prev=7 target=7
rq->nr_running=1 p=[kworker/u16:2 75] p->cpus_ptr=0-7
current=[ksoftirqd/7 46]

kworker/7:1H-144 [007] d.h3. 335.284388: select_task_rq_fair: (3):
is_idle_task(current)=0 in_task()=0 this=7 prev=7 target=7
rq->nr_running=1 p=[task_n10-1 2397] p->cpus_ptr=0-7
current=[kworker/7:1H 144]

2021-11-30 15:42:44

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Mon, 29 Nov 2021 at 17:54, Vincent Donnefort
<[email protected]> wrote:
>
> [...]
>
> > > > >
> > > > > still i don't see the need of !is_idle_task(current)
> > > > >
> > > >
> > > > Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> > > > which can lead to coscheduling when the wakeup is issued by the idle task
> > > > (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> > > > having sent an IPI due to polling). Essentially this overrides the first
> > > > check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> > > > target).
> > > >
> > > > I couldn't prove such wakeups can happen right now, but if/when they do
> > > > (AIUI it would just take someone to add a wake_up_process() down some
> > > > smp_call_function() callback) then we'll need the above. If you're still
> > > > not convinced by now, I won't push it further.
> > >
> > > From a quick experiment, even with the asym_fits_capacity(), I can trigger
> > > the following:
> > >
> > > [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> >
> > Timestamp shows its booting phase and thread name above shows per cpu
> > thread. Could it happen just while creating per cpu thread at boot and
> > as a result not relevant ?
>
> I have more of those logs a bit later in the boot:
>
> [ 0.484791] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.516495] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.525758] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.535078] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.547486] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> [ 0.579192] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
>
> The nr_cpus_allowed=8 suggest that none of the threads from the logs I
> shared are per-CPU. Sorry if the format is confusing, I used:
>
> wakee=<comm>:<pid> current=<comm>:<pid>.
>
> >
> > Can you see similar things later after booting ?
>
> I tried few scenarios other than the boot time but none of them produced
> "current=swapper/X:1 in_task=1"
>
> >
> > I have tried to trigger the situation but failed to get wrong
> > sequence. All are coming from interrupt while idle.
> > After adding in_task() condition, I haven't been able to trigger the
> > warn() that I added to catch the wrong situations on SMP, Heterogenous
> > or NUMA system. Could you share more details on your setup ?
> >
>
> This is just my Hikey960 with the asym_fits_capacity() fix [1] to make sure I
> don't simply hit the other issue with asym platforms.

I ran my previous tests on dragonboard 845c which is dynamiQ and I
have tried on my hikey960 since but without any success so far. This
is what i use:

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6397,9 +6397,12 @@ static int select_idle_sibling(struct
task_struct *p, int prev, int target)
* essentially a sync wakeup. An obvious example of this
* pattern is IO completions.
*/
- if (is_per_cpu_kthread(current) &&
+ if (in_task() &&
+ is_per_cpu_kthread(current) &&
prev == smp_processor_id() &&
this_rq()->nr_running <= 1) {
+
+ WARN(is_idle_task(current), "idle per cpu kthread: cpu
%d task: %s", prev, p->comm);
return prev;
}


Without in_task() condition, i've got warnings from interrupt context
but nothing else.
Note that I don't even have the asym_fits_capacity() condition

>
> Then I just added my log in the per-CPU kthread wakee stacking exit path
>
> printk("%s: wakee=%s:%d nr_cpus_allowed=%d current=%s:%d in_task=%d\n",
> __func__, p->comm, p->pid, p->nr_cpus_allowed, current->comm, current->pid, in_task());
>
>
> [1] https://lore.kernel.org/all/[email protected]/
>
>
> From the same logs I also see:
>
> wakee=xfsaild/mmcblk0:4855 nr_cpus_allowed=8 current=kworker/1:1:1070 in_task=0
>
> Doesn't that look like a genuine wakeup that would escape the per-CPU kthread
> stacking exit path because of the in_task test?

2021-12-01 14:44:13

by Vincent Donnefort

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Tue, Nov 30, 2021 at 04:42:03PM +0100, Vincent Guittot wrote:
> On Mon, 29 Nov 2021 at 17:54, Vincent Donnefort
> <[email protected]> wrote:
> >
> > [...]
> >
> > > > > >
> > > > > > still i don't see the need of !is_idle_task(current)
> > > > > >
> > > > >
> > > > > Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> > > > > which can lead to coscheduling when the wakeup is issued by the idle task
> > > > > (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> > > > > having sent an IPI due to polling). Essentially this overrides the first
> > > > > check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> > > > > target).
> > > > >
> > > > > I couldn't prove such wakeups can happen right now, but if/when they do
> > > > > (AIUI it would just take someone to add a wake_up_process() down some
> > > > > smp_call_function() callback) then we'll need the above. If you're still
> > > > > not convinced by now, I won't push it further.
> > > >
> > > > From a quick experiment, even with the asym_fits_capacity(), I can trigger
> > > > the following:
> > > >
> > > > [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > >
> > > Timestamp shows its booting phase and thread name above shows per cpu
> > > thread. Could it happen just while creating per cpu thread at boot and
> > > as a result not relevant ?
> >
> > I have more of those logs a bit later in the boot:
> >
> > [ 0.484791] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.516495] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.525758] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.535078] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.547486] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > [ 0.579192] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> >
> > The nr_cpus_allowed=8 suggest that none of the threads from the logs I
> > shared are per-CPU. Sorry if the format is confusing, I used:
> >
> > wakee=<comm>:<pid> current=<comm>:<pid>.
> >
> > >
> > > Can you see similar things later after booting ?
> >
> > I tried few scenarios other than the boot time but none of them produced
> > "current=swapper/X:1 in_task=1"
> >
> > >
> > > I have tried to trigger the situation but failed to get wrong
> > > sequence. All are coming from interrupt while idle.
> > > After adding in_task() condition, I haven't been able to trigger the
> > > warn() that I added to catch the wrong situations on SMP, Heterogenous
> > > or NUMA system. Could you share more details on your setup ?
> > >
> >
> > This is just my Hikey960 with the asym_fits_capacity() fix [1] to make sure I
> > don't simply hit the other issue with asym platforms.
>
> I ran my previous tests on dragonboard 845c which is dynamiQ and I
> have tried on my hikey960 since but without any success so far. This
> is what i use:
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6397,9 +6397,12 @@ static int select_idle_sibling(struct
> task_struct *p, int prev, int target)
> * essentially a sync wakeup. An obvious example of this
> * pattern is IO completions.
> */
> - if (is_per_cpu_kthread(current) &&
> + if (in_task() &&
> + is_per_cpu_kthread(current) &&
> prev == smp_processor_id() &&
> this_rq()->nr_running <= 1) {
> +
> + WARN(is_idle_task(current), "idle per cpu kthread: cpu
> %d task: %s", prev, p->comm);
> return prev;
> }
>
>
> Without in_task() condition, i've got warnings from interrupt context
> but nothing else.
> Note that I don't even have the asym_fits_capacity() condition

I could not find a setup reproducing that issue outside of the boot time. So
following our conversation, I made a v2 that switch !is_idle_task() to in_task().

>
> >
> > Then I just added my log in the per-CPU kthread wakee stacking exit path
> >
> > printk("%s: wakee=%s:%d nr_cpus_allowed=%d current=%s:%d in_task=%d\n",
> > __func__, p->comm, p->pid, p->nr_cpus_allowed, current->comm, current->pid, in_task());
> >
> >
> > [1] https://lore.kernel.org/all/[email protected]/
> >
> >
> > From the same logs I also see:
> >
> > wakee=xfsaild/mmcblk0:4855 nr_cpus_allowed=8 current=kworker/1:1:1070 in_task=0
> >
> > Doesn't that look like a genuine wakeup that would escape the per-CPU kthread
> > stacking exit path because of the in_task test?

My bad, I checked and this is not a genuine one...


2021-12-01 16:20:16

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix detection of per-CPU kthreads waking a task

On Wed, 1 Dec 2021 at 15:40, Vincent Donnefort
<[email protected]> wrote:
>
> On Tue, Nov 30, 2021 at 04:42:03PM +0100, Vincent Guittot wrote:
> > On Mon, 29 Nov 2021 at 17:54, Vincent Donnefort
> > <[email protected]> wrote:
> > >
> > > [...]
> > >
> > > > > > >
> > > > > > > still i don't see the need of !is_idle_task(current)
> > > > > > >
> > > > > >
> > > > > > Admittedly, belts and braces. The existing condition checks rq->nr_running <= 1
> > > > > > which can lead to coscheduling when the wakeup is issued by the idle task
> > > > > > (or even if rq->nr_running == 0, you can have rq->ttwu_pending without
> > > > > > having sent an IPI due to polling). Essentially this overrides the first
> > > > > > check in sis() that uses idle_cpu(target) (prev == smp_processor_id() ==
> > > > > > target).
> > > > > >
> > > > > > I couldn't prove such wakeups can happen right now, but if/when they do
> > > > > > (AIUI it would just take someone to add a wake_up_process() down some
> > > > > > smp_call_function() callback) then we'll need the above. If you're still
> > > > > > not convinced by now, I won't push it further.
> > > > >
> > > > > From a quick experiment, even with the asym_fits_capacity(), I can trigger
> > > > > the following:
> > > > >
> > > > > [ 0.118855] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > > [ 0.128214] select_idle_sibling: wakee=rcu_gp:3 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > > [ 0.137327] select_idle_sibling: wakee=rcu_par_gp:4 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > > [ 0.147221] select_idle_sibling: wakee=kworker/u16:0:7 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > > > [ 0.156994] select_idle_sibling: wakee=mm_percpu_wq:8 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > >
> > > > Timestamp shows its booting phase and thread name above shows per cpu
> > > > thread. Could it happen just while creating per cpu thread at boot and
> > > > as a result not relevant ?
> > >
> > > I have more of those logs a bit later in the boot:
> > >
> > > [ 0.484791] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.516495] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.525758] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.535078] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.547486] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > > [ 0.579192] select_idle_sibling: wakee=kthreadd:2 nr_cpus_allowed=8 current=swapper/0:1 in_task=1
> > >
> > > The nr_cpus_allowed=8 suggest that none of the threads from the logs I
> > > shared are per-CPU. Sorry if the format is confusing, I used:
> > >
> > > wakee=<comm>:<pid> current=<comm>:<pid>.
> > >
> > > >
> > > > Can you see similar things later after booting ?
> > >
> > > I tried few scenarios other than the boot time but none of them produced
> > > "current=swapper/X:1 in_task=1"
> > >
> > > >
> > > > I have tried to trigger the situation but failed to get wrong
> > > > sequence. All are coming from interrupt while idle.
> > > > After adding in_task() condition, I haven't been able to trigger the
> > > > warn() that I added to catch the wrong situations on SMP, Heterogenous
> > > > or NUMA system. Could you share more details on your setup ?
> > > >
> > >
> > > This is just my Hikey960 with the asym_fits_capacity() fix [1] to make sure I
> > > don't simply hit the other issue with asym platforms.
> >
> > I ran my previous tests on dragonboard 845c which is dynamiQ and I
> > have tried on my hikey960 since but without any success so far. This
> > is what i use:
> >
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -6397,9 +6397,12 @@ static int select_idle_sibling(struct
> > task_struct *p, int prev, int target)
> > * essentially a sync wakeup. An obvious example of this
> > * pattern is IO completions.
> > */
> > - if (is_per_cpu_kthread(current) &&
> > + if (in_task() &&
> > + is_per_cpu_kthread(current) &&
> > prev == smp_processor_id() &&
> > this_rq()->nr_running <= 1) {
> > +
> > + WARN(is_idle_task(current), "idle per cpu kthread: cpu
> > %d task: %s", prev, p->comm);
> > return prev;
> > }
> >
> >
> > Without in_task() condition, i've got warnings from interrupt context
> > but nothing else.
> > Note that I don't even have the asym_fits_capacity() condition
>
> I could not find a setup reproducing that issue outside of the boot time. So
> following our conversation, I made a v2 that switch !is_idle_task() to in_task().

Ok.
Thanks

>
> >
> > >
> > > Then I just added my log in the per-CPU kthread wakee stacking exit path
> > >
> > > printk("%s: wakee=%s:%d nr_cpus_allowed=%d current=%s:%d in_task=%d\n",
> > > __func__, p->comm, p->pid, p->nr_cpus_allowed, current->comm, current->pid, in_task());
> > >
> > >
> > > [1] https://lore.kernel.org/all/[email protected]/
> > >
> > >
> > > From the same logs I also see:
> > >
> > > wakee=xfsaild/mmcblk0:4855 nr_cpus_allowed=8 current=kworker/1:1:1070 in_task=0
> > >
> > > Doesn't that look like a genuine wakeup that would escape the per-CPU kthread
> > > stacking exit path because of the in_task test?
>
> My bad, I checked and this is not a genuine one...
>