2022-04-11 14:05:07

by David Laight

[permalink] [raw]
Subject: Scheduling tasks on idle cpu

From: Qais Yousef
> Sent: 09 April 2022 18:09
...
> RT scheduler will push/pull tasks to ensure the task will get to run ASAP if
> there's another cpu at lower priority is available

Does that actually happen?
I've seen the following:
34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
This is in spite of there being 5 idle cpu.
cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.

This is on a RHEL7 kernel, I've not replicated it on anything recent.
But I've very much like a RT thread to be able to schedule a non-RT
thread to run on an idle cpu.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


2022-04-12 00:48:50

by Steven Rostedt

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On Mon, 11 Apr 2022 08:26:33 +0000
David Laight <[email protected]> wrote:

> Does that actually happen?
> I've seen the following:
> 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
> 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
> and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> This is in spite of there being 5 idle cpu.

What's the topology? I believe the scheduler will refrain from
migrating tasks to idle CPUs that are on other NUMA nodes as much as
possible. Were those other 5 idle CPUs on another node?

-- Steve


> cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
>
> This is on a RHEL7 kernel, I've not replicated it on anything recent.
> But I've very much like a RT thread to be able to schedule a non-RT
> thread to run on an idle cpu.

2022-04-12 07:00:08

by Qais Yousef

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On 04/11/22 08:26, David Laight wrote:
> From: Qais Yousef
> > Sent: 09 April 2022 18:09
> ...
> > RT scheduler will push/pull tasks to ensure the task will get to run ASAP if
> > there's another cpu at lower priority is available
>
> Does that actually happen?

For RT tasks, yes. They should get distributed.

> I've seen the following:
> 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
> 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017

prio=120 is a CFS task, no?

> and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> This is in spite of there being 5 idle cpu.
> cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
>
> This is on a RHEL7 kernel, I've not replicated it on anything recent.
> But I've very much like a RT thread to be able to schedule a non-RT
> thread to run on an idle cpu.

Oh, you want CFS to avoid CPUs that are running RT tasks.

We had a proposal in the past, but it wasn't good enough

https://lore.kernel.org/lkml/[email protected]/

The approach in that patch modified RT to avoid CFS actually.

Can you verify whether the RT task woke up after task 34512 was migrated to CPU
17? Looking at the definition of available_idle_cpu() we should have avoided
that CPU if the RT task was already running. Both waking up at the same time
would explain what you see. Otherwise I'm not sure why it picked CPU 17.

Thanks

--
Qais Yousef

2022-04-12 09:11:10

by David Laight

[permalink] [raw]
Subject: RE: Scheduling tasks on idle cpu

From: Steven Rostedt
> Sent: 11 April 2022 10:27
>
> On Mon, 11 Apr 2022 08:26:33 +0000
> David Laight <[email protected]> wrote:
>
> > Does that actually happen?
> > I've seen the following:
> > 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
> > 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> > 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
> > and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> > This is in spite of there being 5 idle cpu.
>
> What's the topology? I believe the scheduler will refrain from
> migrating tasks to idle CPUs that are on other NUMA nodes as much as
> possible. Were those other 5 idle CPUs on another node?

There are two physical cpu with 20 cores each (with hyperthreading).
16, 18, 34, 36 and 38 were idle.
So both 16 and 18 should be on the same NUMA node.
All the others are running the same RT thread code.

David

>
> -- Steve
>
>
> > cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
> >
> > This is on a RHEL7 kernel, I've not replicated it on anything recent.
> > But I've very much like a RT thread to be able to schedule a non-RT
> > thread to run on an idle cpu.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-04-12 22:09:26

by Vincent Guittot

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On Tue, 12 Apr 2022 at 10:39, David Laight <[email protected]> wrote:
>
> From: Qais Yousef
> > Sent: 12 April 2022 00:35
> >
> > On 04/11/22 08:26, David Laight wrote:
> > > From: Qais Yousef
> > > > Sent: 09 April 2022 18:09
> > > ...
> > > > RT scheduler will push/pull tasks to ensure the task will get to run ASAP if
> > > > there's another cpu at lower priority is available
> > >
> > > Does that actually happen?
> >
> > For RT tasks, yes. They should get distributed.
>
> Ok, that is something slightly different from what I'm seeing.
>
> > > I've seen the following:
> > > 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
> > > 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> > > 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
> >
> > prio=120 is a CFS task, no?
>
> CFS = 'normal time-slice processes ? Then yes.
>
> > > and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> > > This is in spite of there being 5 idle cpu.
> > > cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
> > >
> > > This is on a RHEL7 kernel, I've not replicated it on anything recent.
> > > But I've very much like a RT thread to be able to schedule a non-RT
> > > thread to run on an idle cpu.
> >
> > Oh, you want CFS to avoid CPUs that are running RT tasks.
> >
> > We had a proposal in the past, but it wasn't good enough
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> That seems to be something different.
> Related to something else I've seen where a RT process is scheduled
> on its old cpu (to get the hot cache) but the process running on
> that cpu is looping in kernel - so the RT process doesn't start.
>
> I've avoided most of the pain that caused by not using a single
> cv_broadcast() to wake up the 34 RT threads (in this config).
> (Each kernel thread seemed to wake up the next one, so the
> delays were cumulative.)
> Instead there is a separate cv for each RT thread.
> I actually want the 'herd of wildebeest' :-)
>
> > The approach in that patch modified RT to avoid CFS actually.
>
> Yes I want the CFS scheduler to pick an idle cpu in preference
> to an active RT one.

When task 34512 wakes up, scheduler checks if prev or this cpu are
idle which is not the case for you. Then, it compares the load of prev
and this_cpu and seems to select this_cpu (cpu17).

Once cpu17 selected, it will try to find an idle cpu which shares LLC
but it seems that the scheduler didn't find one and finally keeps task
34512 on this_cpu.

Note that during the next tick, a load balance will be trigger if
this_cpu still have both RT and task 34512,

>
> > Can you verify whether the RT task woke up after task 34512 was migrated to CPU
> > 17? Looking at the definition of available_idle_cpu() we should have avoided
> > that CPU if the RT task was already running. Both waking up at the same time
> > would explain what you see. Otherwise I'm not sure why it picked CPU 17.
>
> All 35 RT tasks are running when the request to schedule task 34512 is made.
> (They wake every 10ms to process UDP/RTP audio packets.)
> The RT task on cpu 17 carried on running until it ran out of work (after about 1ms).
> Task 34512 then ran on cpu 17.
>
> In this case task 34512 actually finished quite quickly.
> (It is creating and binding more UDP sockets.)
> But it looks like if it were still running on the next 10ms 'tick'
> it would be pre-empted by the RT task and be idle.
> Not ideal when I'm trying to schedule a background activity.
>
> I don't think the load-balancer will ever pick it up.
> All the process scheduling is happening far too fast.
>
> What I think might be happening is that the futex() code is requesting
> the woken up thread run on the current cpu.
> This can be advantageous in some circumstances - usually if you
> know the current thread is about to sleep.
> (I remember another scheduler doing that, but I can't remember why!
> The only sequence I can think of is a shell doing fork+exec+wait.)
> But it seems like a bad idea when a RT thread is waking a CFS one.
> (Or any case where the one being woken is lower priority.)
>
> I might have to run the 'background tasks' at low RT priority
> just to get them scheduled on idle cpu.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2022-04-12 22:49:45

by David Laight

[permalink] [raw]
Subject: RE: Scheduling tasks on idle cpu

From: Qais Yousef
> Sent: 12 April 2022 00:35
>
> On 04/11/22 08:26, David Laight wrote:
> > From: Qais Yousef
> > > Sent: 09 April 2022 18:09
> > ...
> > > RT scheduler will push/pull tasks to ensure the task will get to run ASAP if
> > > there's another cpu at lower priority is available
> >
> > Does that actually happen?
>
> For RT tasks, yes. They should get distributed.

Ok, that is something slightly different from what I'm seeing.

> > I've seen the following:
> > 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
> > 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> > 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
>
> prio=120 is a CFS task, no?

CFS = 'normal time-slice processes ? Then yes.

> > and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> > This is in spite of there being 5 idle cpu.
> > cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
> >
> > This is on a RHEL7 kernel, I've not replicated it on anything recent.
> > But I've very much like a RT thread to be able to schedule a non-RT
> > thread to run on an idle cpu.
>
> Oh, you want CFS to avoid CPUs that are running RT tasks.
>
> We had a proposal in the past, but it wasn't good enough
>
> https://lore.kernel.org/lkml/[email protected]/

That seems to be something different.
Related to something else I've seen where a RT process is scheduled
on its old cpu (to get the hot cache) but the process running on
that cpu is looping in kernel - so the RT process doesn't start.

I've avoided most of the pain that caused by not using a single
cv_broadcast() to wake up the 34 RT threads (in this config).
(Each kernel thread seemed to wake up the next one, so the
delays were cumulative.)
Instead there is a separate cv for each RT thread.
I actually want the 'herd of wildebeest' :-)

> The approach in that patch modified RT to avoid CFS actually.

Yes I want the CFS scheduler to pick an idle cpu in preference
to an active RT one.

> Can you verify whether the RT task woke up after task 34512 was migrated to CPU
> 17? Looking at the definition of available_idle_cpu() we should have avoided
> that CPU if the RT task was already running. Both waking up at the same time
> would explain what you see. Otherwise I'm not sure why it picked CPU 17.

All 35 RT tasks are running when the request to schedule task 34512 is made.
(They wake every 10ms to process UDP/RTP audio packets.)
The RT task on cpu 17 carried on running until it ran out of work (after about 1ms).
Task 34512 then ran on cpu 17.

In this case task 34512 actually finished quite quickly.
(It is creating and binding more UDP sockets.)
But it looks like if it were still running on the next 10ms 'tick'
it would be pre-empted by the RT task and be idle.
Not ideal when I'm trying to schedule a background activity.

I don't think the load-balancer will ever pick it up.
All the process scheduling is happening far too fast.

What I think might be happening is that the futex() code is requesting
the woken up thread run on the current cpu.
This can be advantageous in some circumstances - usually if you
know the current thread is about to sleep.
(I remember another scheduler doing that, but I can't remember why!
The only sequence I can think of is a shell doing fork+exec+wait.)
But it seems like a bad idea when a RT thread is waking a CFS one.
(Or any case where the one being woken is lower priority.)

I might have to run the 'background tasks' at low RT priority
just to get them scheduled on idle cpu.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-04-14 12:20:42

by Qais Yousef

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On 04/12/22 08:39, David Laight wrote:
> From: Qais Yousef
> > Sent: 12 April 2022 00:35
> >
> > On 04/11/22 08:26, David Laight wrote:
> > > From: Qais Yousef
> > > > Sent: 09 April 2022 18:09
> > > ...
> > > > RT scheduler will push/pull tasks to ensure the task will get to run ASAP if
> > > > there's another cpu at lower priority is available
> > >
> > > Does that actually happen?
> >
> > For RT tasks, yes. They should get distributed.
>
> Ok, that is something slightly different from what I'm seeing.

If you have multiple SCHED_FIFO/SCHED_RR tasks with the same priority, they
don't end up being distributed on different CPUs? Assuming number of tasks is
not higher than number of CPUs.

Generally if there are two RT tasks on the same CPU and there's another CPU
that is running something that is lower priority than these two, then the lower
priority of these 2 tasks should move to that CPU.

Eh, hope that's readable :-)

>
> > > I've seen the following:
> > > 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3: 4000001)
> > > 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> > > 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
> >
> > prio=120 is a CFS task, no?
>
> CFS = 'normal time-slice processes ? Then yes.

Sorry, yes. CFS = SCHED_NORMAL/SCHED_OTHER.

>
> > > and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> > > This is in spite of there being 5 idle cpu.
> > > cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
> > >
> > > This is on a RHEL7 kernel, I've not replicated it on anything recent.
> > > But I've very much like a RT thread to be able to schedule a non-RT
> > > thread to run on an idle cpu.
> >
> > Oh, you want CFS to avoid CPUs that are running RT tasks.
> >
> > We had a proposal in the past, but it wasn't good enough
> >
> > https://lore.kernel.org/lkml/[email protected]/
>
> That seems to be something different.
> Related to something else I've seen where a RT process is scheduled
> on its old cpu (to get the hot cache) but the process running on
> that cpu is looping in kernel - so the RT process doesn't start.

I *think* you're hitting softirq latencies. Most likely it's the network RX
softirq processing the packets. If this latency is a problem, then PREEMPT_RT
[1] should help with this. For Android we hit this issue and there's a long
living out of tree patch that I'm trying to find an upstream replacement for.

There's a new knob to reduce how long netdev spends in the loop. Might be worth
a try:

https://lore.kernel.org/netdev/[email protected]/

[1] https://wiki.linuxfoundation.org/realtime/start

>
> I've avoided most of the pain that caused by not using a single
> cv_broadcast() to wake up the 34 RT threads (in this config).
> (Each kernel thread seemed to wake up the next one, so the
> delays were cumulative.)
> Instead there is a separate cv for each RT thread.
> I actually want the 'herd of wildebeest' :-)

It seems you have a big RT app running in userspace. I thought initially you're
hitting issues with random kthreads or something. If you have control over
these tasks, then that should be easier to handle (as you suggest at the end).

I'm not sure about the delays when using cv_broadcast(). Could it be the way
this library is implemented is causing the problem rather than a kernel
limitation?

>
> > The approach in that patch modified RT to avoid CFS actually.
>
> Yes I want the CFS scheduler to pick an idle cpu in preference
> to an active RT one.

I think that's what should happen. But I think it's racy. Vincent knows this
code better though, so I'll defer to him.

>
> > Can you verify whether the RT task woke up after task 34512 was migrated to CPU
> > 17? Looking at the definition of available_idle_cpu() we should have avoided
> > that CPU if the RT task was already running. Both waking up at the same time
> > would explain what you see. Otherwise I'm not sure why it picked CPU 17.
>
> All 35 RT tasks are running when the request to schedule task 34512 is made.
> (They wake every 10ms to process UDP/RTP audio packets.)
> The RT task on cpu 17 carried on running until it ran out of work (after about 1ms).
> Task 34512 then ran on cpu 17.
>
> In this case task 34512 actually finished quite quickly.
> (It is creating and binding more UDP sockets.)
> But it looks like if it were still running on the next 10ms 'tick'
> it would be pre-empted by the RT task and be idle.
> Not ideal when I'm trying to schedule a background activity.
>
> I don't think the load-balancer will ever pick it up.
> All the process scheduling is happening far too fast.
>
> What I think might be happening is that the futex() code is requesting
> the woken up thread run on the current cpu.

Hmm. Looking at kernel/futex/waitwake.c::futex_wake() it just ends up calling
wake_up_process(). So that might not be the case.

> This can be advantageous in some circumstances - usually if you
> know the current thread is about to sleep.
> (I remember another scheduler doing that, but I can't remember why!
> The only sequence I can think of is a shell doing fork+exec+wait.)
> But it seems like a bad idea when a RT thread is waking a CFS one.
> (Or any case where the one being woken is lower priority.)
>
> I might have to run the 'background tasks' at low RT priority
> just to get them scheduled on idle cpu.

If you make it an RT task (which I think is a good idea), then the RT scheduler
will handle it in the push/pull remark that seem to have started this
discussion and get pushed/pulled to another CPU that is running lower priority
task.

Cheers

--
Qais Yousef

2022-04-14 13:34:17

by David Laight

[permalink] [raw]
Subject: RE: Scheduling tasks on idle cpu

From: Qais Yousef
> Sent: 14 April 2022 00:51
>
> On 04/12/22 08:39, David Laight wrote:
> > From: Qais Yousef
> > > Sent: 12 April 2022 00:35
> > >
> > > On 04/11/22 08:26, David Laight wrote:
> > > > From: Qais Yousef
> > > > > Sent: 09 April 2022 18:09
> > > > ...
> > > > > RT scheduler will push/pull tasks to ensure the task will get to run ASAP if
> > > > > there's another cpu at lower priority is available
> > > >
> > > > Does that actually happen?
> > >
> > > For RT tasks, yes. They should get distributed.
> >
> > Ok, that is something slightly different from what I'm seeing.
>
> If you have multiple SCHED_FIFO/SCHED_RR tasks with the same priority, they
> don't end up being distributed on different CPUs? Assuming number of tasks is
> not higher than number of CPUs.
>
> Generally if there are two RT tasks on the same CPU and there's another CPU
> that is running something that is lower priority than these two, then the lower
> priority of these 2 tasks should move to that CPU.
>
> Eh, hope that's readable :-)

That is (just about) readable, and is happening.

> > > > I've seen the following:
> > > > 34533 [017]: sys_futex(uaddr: 1049104, op: 85, val: 1, utime: 1, uaddr2: 1049100, val3:
> 4000001)
> > > > 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> > > > 34533 [017]: sched_wakeup: pid=34512 prio=120 success=1 target_cpu=017
> > >
> > > prio=120 is a CFS task, no?
> >
> > CFS = 'normal time-slice processes ? Then yes.
>
> Sorry, yes. CFS = SCHED_NORMAL/SCHED_OTHER.
>
> >
> > > > and pid 34512 doesn't get scheduled until pid 34533 finally sleeps.
> > > > This is in spite of there being 5 idle cpu.
> > > > cpu 14 is busy running a RT thread, but migrating to cpu 17 seems wrong.
> > > >
> > > > This is on a RHEL7 kernel, I've not replicated it on anything recent.
> > > > But I've very much like a RT thread to be able to schedule a non-RT
> > > > thread to run on an idle cpu.
> > >
> > > Oh, you want CFS to avoid CPUs that are running RT tasks.
> > >
> > > We had a proposal in the past, but it wasn't good enough
> > >
> > > https://lore.kernel.org/lkml/[email protected]/
> >
> > That seems to be something different.
> > Related to something else I've seen where a RT process is scheduled
> > on its old cpu (to get the hot cache) but the process running on
> > that cpu is looping in kernel - so the RT process doesn't start.
>
> I *think* you're hitting softirq latencies. Most likely it's the network RX
> softirq processing the packets. If this latency is a problem, then PREEMPT_RT
> [1] should help with this. For Android we hit this issue and there's a long
> living out of tree patch that I'm trying to find an upstream replacement for.

I suspect the costs of PREEMPT_RT would slow things down too much.
This test system has 40 cpu, 35 of them are RT and processing the same 'jobs'.
It doesn't really matter if one is delayed by the network irq + softirq code.
The problems arise if they all stop.
The 'job' list was protected by a mutex - usually not too bad.
But if a network irq interrupts the code while it holds the mutex then all
the RT tasks stall until the softirq code completes.
I've replaced the linked list with an array and used atomic_inc().

I can imagine that a PREEMPT_RT kernel will have the same problem
because (I think) all the spin locks get replaced by sleep locks.

>
> There's a new knob to reduce how long netdev spends in the loop. Might be worth
> a try:
>
> https://lore.kernel.org/netdev/[email protected]/
>
> [1] https://wiki.linuxfoundation.org/realtime/start

I think the patch that runs the softirq in a separate thread might help.
But it probably needs a test to only to that if it would 'stall' a RT process.

> > I've avoided most of the pain that caused by not using a single
> > cv_broadcast() to wake up the 34 RT threads (in this config).
> > (Each kernel thread seemed to wake up the next one, so the
> > delays were cumulative.)
> > Instead there is a separate cv for each RT thread.
> > I actually want the 'herd of wildebeest' :-)
>
> It seems you have a big RT app running in userspace. I thought initially you're
> hitting issues with random kthreads or something. If you have control over
> these tasks, then that should be easier to handle (as you suggest at the end).

I've a big app with a lot of RT threads doing network send/receive.
(All the packets as ~200 byte UDP, 50/sec on 1000+ port numbers.)
But there are other things going on as well.

> I'm not sure about the delays when using cv_broadcast(). Could it be the way
> this library is implemented is causing the problem rather than a kernel
> limitation?

I was definitely seeing the threads wake up one by one.
Every 10ms one of the RT threads wakes up and then wakes up all the others.
There weren't any 'extra' system calls, once one thread was running
in kernel the next one got woken up.
Most (and always) noticeable were the delays getting each cpu out
of its sleep state.
But if one of the required cpu was (eg) running the softint code
none of the latter ones would wake up.

> > > The approach in that patch modified RT to avoid CFS actually.
> >
> > Yes I want the CFS scheduler to pick an idle cpu in preference
> > to an active RT one.
>
> I think that's what should happen. But I think it's racy. Vincent knows this
> code better though, so I'll defer to him.
>
> >
> > > Can you verify whether the RT task woke up after task 34512 was migrated to CPU
> > > 17? Looking at the definition of available_idle_cpu() we should have avoided
> > > that CPU if the RT task was already running. Both waking up at the same time
> > > would explain what you see. Otherwise I'm not sure why it picked CPU 17.
> >
> > All 35 RT tasks are running when the request to schedule task 34512 is made.
> > (They wake every 10ms to process UDP/RTP audio packets.)
> > The RT task on cpu 17 carried on running until it ran out of work (after about 1ms).
> > Task 34512 then ran on cpu 17.
> >
> > In this case task 34512 actually finished quite quickly.
> > (It is creating and binding more UDP sockets.)
> > But it looks like if it were still running on the next 10ms 'tick'
> > it would be pre-empted by the RT task and be idle.
> > Not ideal when I'm trying to schedule a background activity.
> >
> > I don't think the load-balancer will ever pick it up.
> > All the process scheduling is happening far too fast.
> >
> > What I think might be happening is that the futex() code is requesting
> > the woken up thread run on the current cpu.
>
> Hmm. Looking at kernel/futex/waitwake.c::futex_wake() it just ends up calling
> wake_up_process(). So that might not be the case.
>
> > This can be advantageous in some circumstances - usually if you
> > know the current thread is about to sleep.
> > (I remember another scheduler doing that, but I can't remember why!
> > The only sequence I can think of is a shell doing fork+exec+wait.)
> > But it seems like a bad idea when a RT thread is waking a CFS one.
> > (Or any case where the one being woken is lower priority.)
> >
> > I might have to run the 'background tasks' at low RT priority
> > just to get them scheduled on idle cpu.
>
> If you make it an RT task (which I think is a good idea), then the RT scheduler
> will handle it in the push/pull remark that seem to have started this
> discussion and get pushed/pulled to another CPU that is running lower priority
> task.

The problem is that while I'd like this thread to start immediately
what it is doing isn't THAT important.
There are other things that might run on the CFS scheduler that are
more important.
I can make it RT for experiments.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-04-14 18:59:48

by Qais Yousef

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On 04/12/22 11:07, Vincent Guittot wrote:
> On Tue, 12 Apr 2022 at 10:39, David Laight <[email protected]> wrote:
> > Yes I want the CFS scheduler to pick an idle cpu in preference
> > to an active RT one.
>
> When task 34512 wakes up, scheduler checks if prev or this cpu are
> idle which is not the case for you. Then, it compares the load of prev
> and this_cpu and seems to select this_cpu (cpu17).
>
> Once cpu17 selected, it will try to find an idle cpu which shares LLC
> but it seems that the scheduler didn't find one and finally keeps task
> 34512 on this_cpu.
>
> Note that during the next tick, a load balance will be trigger if
> this_cpu still have both RT and task 34512,

David said there are idle cpus

" There are two physical cpu with 20 cores each (with hyperthreading).
16, 18, 34, 36 and 38 were idle. So both 16 and 18 should be on the
same NUMA node. All the others are running the same RT thread code. "

Except for the possibility of them becoming idle just after the task has woken
up, shouldn't one of them have been picked?

Thanks

--
Qais Yousef

2022-04-14 20:59:43

by David Laight

[permalink] [raw]
Subject: RE: Scheduling tasks on idle cpu

From: Vincent Guittot
> Sent: 14 April 2022 08:54
>
> On Thu, 14 Apr 2022 at 01:57, Qais Yousef <[email protected]> wrote:
> >
> > On 04/12/22 11:07, Vincent Guittot wrote:
> > > On Tue, 12 Apr 2022 at 10:39, David Laight <[email protected]> wrote:
> > > > Yes I want the CFS scheduler to pick an idle cpu in preference
> > > > to an active RT one.
> > >
> > > When task 34512 wakes up, scheduler checks if prev or this cpu are
> > > idle which is not the case for you. Then, it compares the load of prev
> > > and this_cpu and seems to select this_cpu (cpu17).
> > >
> > > Once cpu17 selected, it will try to find an idle cpu which shares LLC
> > > but it seems that the scheduler didn't find one and finally keeps task
> > > 34512 on this_cpu.
> > >
> > > Note that during the next tick, a load balance will be trigger if
> > > this_cpu still have both RT and task 34512,
> >
> > David said there are idle cpus
> >
> > " There are two physical cpu with 20 cores each (with hyperthreading).
> > 16, 18, 34, 36 and 38 were idle. So both 16 and 18 should be on the
> > same NUMA node. All the others are running the same RT thread code. "
> >
> > Except for the possibility of them becoming idle just after the task has woken
> > up, shouldn't one of them have been picked?
>
> we don't loop on all cpus in the LLC to find an idle one but compute a
> reasonable number of iteration based on the avg_idle

Is there a way to dump the kernel NUMA/LLC tables?
This might be relevant (with everything idle):
# cat /proc/schedstat
version 15
timestamp 5388989193
cpu0 0 0 0 0 0 0 117226041384582 250531565354 206276873
domain0 00,00100001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 55,55555555 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
cpu1 0 0 0 0 0 0 115978661288718 251736933814 297093280
domain0 00,00200002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain1 aa,aaaaaaaa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
All the later cpu follow the same pattern (domain0 shifts left every cpu).

I could interpret that as meaning:
cpu n and (n + 20) are the hyperthreading pairs.
Even numbered cpu are on one chip, odd numbered on the other.

The migrate was:
34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
All the idle cpu were even.

> David can rerun is use case after disabling sched_feat(SIS_PROP)

How would I do that?

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-04-14 21:03:48

by Vincent Guittot

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On Thu, 14 Apr 2022 at 10:35, David Laight <[email protected]> wrote:
>
> From: Vincent Guittot
> > Sent: 14 April 2022 08:54
> >
> > On Thu, 14 Apr 2022 at 01:57, Qais Yousef <[email protected]> wrote:
> > >
> > > On 04/12/22 11:07, Vincent Guittot wrote:
> > > > On Tue, 12 Apr 2022 at 10:39, David Laight <[email protected]> wrote:
> > > > > Yes I want the CFS scheduler to pick an idle cpu in preference
> > > > > to an active RT one.
> > > >
> > > > When task 34512 wakes up, scheduler checks if prev or this cpu are
> > > > idle which is not the case for you. Then, it compares the load of prev
> > > > and this_cpu and seems to select this_cpu (cpu17).
> > > >
> > > > Once cpu17 selected, it will try to find an idle cpu which shares LLC
> > > > but it seems that the scheduler didn't find one and finally keeps task
> > > > 34512 on this_cpu.
> > > >
> > > > Note that during the next tick, a load balance will be trigger if
> > > > this_cpu still have both RT and task 34512,
> > >
> > > David said there are idle cpus
> > >
> > > " There are two physical cpu with 20 cores each (with hyperthreading).
> > > 16, 18, 34, 36 and 38 were idle. So both 16 and 18 should be on the
> > > same NUMA node. All the others are running the same RT thread code. "
> > >
> > > Except for the possibility of them becoming idle just after the task has woken
> > > up, shouldn't one of them have been picked?
> >
> > we don't loop on all cpus in the LLC to find an idle one but compute a
> > reasonable number of iteration based on the avg_idle
>
> Is there a way to dump the kernel NUMA/LLC tables?
> This might be relevant (with everything idle):
> # cat /proc/schedstat
> version 15
> timestamp 5388989193
> cpu0 0 0 0 0 0 0 117226041384582 250531565354 206276873
> domain0 00,00100001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain1 55,55555555 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> cpu1 0 0 0 0 0 0 115978661288718 251736933814 297093280
> domain0 00,00200002 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain1 aa,aaaaaaaa 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> domain2 ff,ffffffff 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> All the later cpu follow the same pattern (domain0 shifts left every cpu).
>
> I could interpret that as meaning:
> cpu n and (n + 20) are the hyperthreading pairs.
> Even numbered cpu are on one chip, odd numbered on the other.
>
> The migrate was:
> 34533 [017]: sched_migrate_task: pid=34512 prio=120 orig_cpu=14 dest_cpu=17
> All the idle cpu were even.
>
> > David can rerun is use case after disabling sched_feat(SIS_PROP)
>
> How would I do that?

echo NO_SIS_PROP > /sys/kernel/debug/sched/features

>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)

2022-04-15 15:48:52

by Vincent Guittot

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On Thu, 14 Apr 2022 at 01:57, Qais Yousef <[email protected]> wrote:
>
> On 04/12/22 11:07, Vincent Guittot wrote:
> > On Tue, 12 Apr 2022 at 10:39, David Laight <[email protected]> wrote:
> > > Yes I want the CFS scheduler to pick an idle cpu in preference
> > > to an active RT one.
> >
> > When task 34512 wakes up, scheduler checks if prev or this cpu are
> > idle which is not the case for you. Then, it compares the load of prev
> > and this_cpu and seems to select this_cpu (cpu17).
> >
> > Once cpu17 selected, it will try to find an idle cpu which shares LLC
> > but it seems that the scheduler didn't find one and finally keeps task
> > 34512 on this_cpu.
> >
> > Note that during the next tick, a load balance will be trigger if
> > this_cpu still have both RT and task 34512,
>
> David said there are idle cpus
>
> " There are two physical cpu with 20 cores each (with hyperthreading).
> 16, 18, 34, 36 and 38 were idle. So both 16 and 18 should be on the
> same NUMA node. All the others are running the same RT thread code. "
>
> Except for the possibility of them becoming idle just after the task has woken
> up, shouldn't one of them have been picked?

we don't loop on all cpus in the LLC to find an idle one but compute a
reasonable number of iteration based on the avg_idle

David can rerun is use case after disabling sched_feat(SIS_PROP)

>
> Thanks
>
> --
> Qais Yousef

2022-04-15 20:09:26

by Qais Yousef

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On 04/14/22 06:09, David Laight wrote:

[...]

> > > That seems to be something different.
> > > Related to something else I've seen where a RT process is scheduled
> > > on its old cpu (to get the hot cache) but the process running on
> > > that cpu is looping in kernel - so the RT process doesn't start.
> >
> > I *think* you're hitting softirq latencies. Most likely it's the network RX
> > softirq processing the packets. If this latency is a problem, then PREEMPT_RT
> > [1] should help with this. For Android we hit this issue and there's a long
> > living out of tree patch that I'm trying to find an upstream replacement for.
>
> I suspect the costs of PREEMPT_RT would slow things down too much.

It shouldn't.. If it did it's worth reporting to the RT folks, or consider
whether some bad usage in userspace is causing the problem.

linux-rt-users mailing list is a good place to ask questions. The details are
in the link to linuxfoundation realtime wiki page.

> This test system has 40 cpu, 35 of them are RT and processing the same 'jobs'.
> It doesn't really matter if one is delayed by the network irq + softirq code.
> The problems arise if they all stop.
> The 'job' list was protected by a mutex - usually not too bad.
> But if a network irq interrupts the code while it holds the mutex then all
> the RT tasks stall until the softirq code completes.
> I've replaced the linked list with an array and used atomic_inc().

I see. So an interrupt that happens in the wrong time could block everything.

You can try 'threadirqs' kernel parameter to see if this helps. PREEMPT_RT will
help with softirq latencies too. So I think this problem should be handled by
PREEMPT_RT.

There's _probably_ room for improving how userspace manages the job list too..
Do the readers have to block?

You can use irq affinities and task affinities to ensure the two never happen
on the same cpu.

> I can imagine that a PREEMPT_RT kernel will have the same problem
> because (I think) all the spin locks get replaced by sleep locks.

I don't think so. The point of PREEMPT_RT is to not block that RT tasks. With
PREEMPT_RT + threadirqs, irqs and softirqs will run as kernel threads. I think
they run as RT tasks, so you can manage which is more important by assigning
the right priorities to your tasks vs irq/softirqs kthreads priorities.

>
> >
> > There's a new knob to reduce how long netdev spends in the loop. Might be worth
> > a try:
> >
> > https://lore.kernel.org/netdev/[email protected]/
> >
> > [1] https://wiki.linuxfoundation.org/realtime/start
>
> I think the patch that runs the softirq in a separate thread might help.
> But it probably needs a test to only to that if it would 'stall' a RT process.

I think people have been using this in rt-kernels for a long time now.
I believe you'd just need to be mindful about priorities since they'll run as
RT tasks.

threadirqs kernel parameter is available in mainline kernel too. But the
softirqs part still didn't get merged, last I checked which was a while ago.
So in mainline irqs will get threaded, but not softirqs - when I last checked.

You might find good info here about tuning systems for RT from Red Hat:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/interrupt_and_process_binding

There's lots of advise regarding various aspects of the system, so worth
exploring if you didn't come across it before.

>
> > > I've avoided most of the pain that caused by not using a single
> > > cv_broadcast() to wake up the 34 RT threads (in this config).
> > > (Each kernel thread seemed to wake up the next one, so the
> > > delays were cumulative.)
> > > Instead there is a separate cv for each RT thread.
> > > I actually want the 'herd of wildebeest' :-)
> >
> > It seems you have a big RT app running in userspace. I thought initially you're
> > hitting issues with random kthreads or something. If you have control over
> > these tasks, then that should be easier to handle (as you suggest at the end).
>
> I've a big app with a lot of RT threads doing network send/receive.
> (All the packets as ~200 byte UDP, 50/sec on 1000+ port numbers.)
> But there are other things going on as well.
>
> > I'm not sure about the delays when using cv_broadcast(). Could it be the way
> > this library is implemented is causing the problem rather than a kernel
> > limitation?
>
> I was definitely seeing the threads wake up one by one.
> Every 10ms one of the RT threads wakes up and then wakes up all the others.
> There weren't any 'extra' system calls, once one thread was running
> in kernel the next one got woken up.
> Most (and always) noticeable were the delays getting each cpu out
> of its sleep state.

Oh, yeah idle states and dvfs are known sources of latencies. You can prevent
the cpus from going into deep idle states.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/8/html-single/optimizing_rhel_8_for_real_time_for_low_latency_operation/index#con_power-saving-states_assembly_controlling-power-management-transitions

> But if one of the required cpu was (eg) running the softint code
> none of the latter ones would wake up.
>

[...]

> > If you make it an RT task (which I think is a good idea), then the RT scheduler
> > will handle it in the push/pull remark that seem to have started this
> > discussion and get pushed/pulled to another CPU that is running lower priority
> > task.
>
> The problem is that while I'd like this thread to start immediately
> what it is doing isn't THAT important.
> There are other things that might run on the CFS scheduler that are
> more important.
> I can make it RT for experiments.

You can isolate 35 cpus if you like to run your RT app and keep the remaining
5 cpus for everything else. Depends what else you use the system for. The red
hat guide I pasted above have a section on using isolated cpus feature.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/tuning_guide/isolating_cpus_using_tuned-profiles-realtime

Although this seems a bit of a stretch for your use case. You can still use
irq and task affinities to ensure certain things don't happen on the same CPU.

Cheers

--
Qais Yousef

2022-04-16 00:13:49

by David Laight

[permalink] [raw]
Subject: RE: Scheduling tasks on idle cpu

From: Vincent Guittot
> Sent: 14 April 2022 11:17
...
> > > David can rerun is use case after disabling sched_feat(SIS_PROP)
> >
> > How would I do that?
>
> echo NO_SIS_PROP > /sys/kernel/debug/sched/features

That may not be in the kernel I'm using.

# cat /sys/kernel/debug/sched_features
GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_POWER TTWU_QUEUE RT_RUNTIME_SHARE NO_LB_MIN NUMA NUMA_FAVOUR_HIGHER NO_NUMA_RESIST_LOWER

I've been looking at another ftrace output.
The scheduler does like migrating the process to the current cpu.
I have seen it migrate from one idle cpu to another idle cpu.
I've not seen it migrate from an idle cpu to the current cpu.
(But I've not looked hard.)

These are all the migrates:
TiNG task:12-1005 [026] d... 1111081.796560: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=11 dest_cpu=16
TiNG task:31-1026 [005] d... 1111081.836556: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=16 dest_cpu=11
TiNG task:28-1023 [033] d... 1111081.856589: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=11 dest_cpu=33
TiNG task:11-1004 [013] d... 1111081.856606: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=33 dest_cpu=13
TiNG task:19-1012 [002] d... 1111081.896564: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=13 dest_cpu=10
TiNG task:26-1019 [008] d... 1111081.956551: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=10 dest_cpu=18
TiNG task:34-1029 [001] d... 1111082.016527: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=18 dest_cpu=1
TiNG task:20-1013 [021] d... 1111082.016589: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=1 dest_cpu=21
TiNG task:32-1027 [000] d... 1111082.036455: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=21 dest_cpu=2
TiNG task:15-1008 [006] d... 1111082.056539: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=2 dest_cpu=14
TiNG task:34-1029 [001] d... 1111082.076536: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=14 dest_cpu=1
TiNG task:21-1014 [004] d... 1111082.076589: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=1 dest_cpu=4
TiNG task:11-1004 [013] d... 1111082.096526: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=4 dest_cpu=13
TiNG task:28-1023 [033] d... 1111082.096584: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=13 dest_cpu=33
TiNG task:25-1018 [029] d... 1111082.116549: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=33 dest_cpu=11
TiNG task:27-1020 [032] d... 1111082.176519: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=11 dest_cpu=16
There are a couple of places where there 2 wakeups before the schedule.
The scheduler definitely doesn't like waking up a process on an even cpu from on odd one.
But there are also the 13->33 and 1->21 ones.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2022-04-16 00:27:36

by Vincent Guittot

[permalink] [raw]
Subject: Re: Scheduling tasks on idle cpu

On Thu, 14 Apr 2022 at 16:14, David Laight <[email protected]> wrote:
>
> From: Vincent Guittot
> > Sent: 14 April 2022 11:17
> ...
> > > > David can rerun is use case after disabling sched_feat(SIS_PROP)
> > >
> > > How would I do that?
> >
> > echo NO_SIS_PROP > /sys/kernel/debug/sched/features
>
> That may not be in the kernel I'm using.
>
> # cat /sys/kernel/debug/sched_features
> GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_POWER TTWU_QUEUE RT_RUNTIME_SHARE NO_LB_MIN NUMA NUMA_FAVOUR_HIGHER NO_NUMA_RESIST_LOWER

SIS_PROP has been normally added in v4.13 so I wonder which kernel
version you are using
Before SIS_PROP, the policy was either to loop all cpu or none
depending of avg_idle and avg_cost

>
> I've been looking at another ftrace output.
> The scheduler does like migrating the process to the current cpu.
> I have seen it migrate from one idle cpu to another idle cpu.
> I've not seen it migrate from an idle cpu to the current cpu.
> (But I've not looked hard.)
>
> These are all the migrates:
> TiNG task:12-1005 [026] d... 1111081.796560: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=11 dest_cpu=16
> TiNG task:31-1026 [005] d... 1111081.836556: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=16 dest_cpu=11
> TiNG task:28-1023 [033] d... 1111081.856589: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=11 dest_cpu=33
> TiNG task:11-1004 [013] d... 1111081.856606: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=33 dest_cpu=13
> TiNG task:19-1012 [002] d... 1111081.896564: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=13 dest_cpu=10
> TiNG task:26-1019 [008] d... 1111081.956551: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=10 dest_cpu=18
> TiNG task:34-1029 [001] d... 1111082.016527: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=18 dest_cpu=1
> TiNG task:20-1013 [021] d... 1111082.016589: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=1 dest_cpu=21
> TiNG task:32-1027 [000] d... 1111082.036455: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=21 dest_cpu=2
> TiNG task:15-1008 [006] d... 1111082.056539: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=2 dest_cpu=14
> TiNG task:34-1029 [001] d... 1111082.076536: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=14 dest_cpu=1
> TiNG task:21-1014 [004] d... 1111082.076589: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=1 dest_cpu=4
> TiNG task:11-1004 [013] d... 1111082.096526: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=4 dest_cpu=13
> TiNG task:28-1023 [033] d... 1111082.096584: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=13 dest_cpu=33
> TiNG task:25-1018 [029] d... 1111082.116549: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=33 dest_cpu=11
> TiNG task:27-1020 [032] d... 1111082.176519: sched_migrate_task: comm=RTP sockets pid=990 prio=120 orig_cpu=11 dest_cpu=16
> There are a couple of places where there 2 wakeups before the schedule.
> The scheduler definitely doesn't like waking up a process on an even cpu from on odd one.

odd and even cpus don't belong to the same llc and we only migrate in
the llc at wakeup

> But there are also the 13->33 and 1->21 ones.
>
> David
>
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)