2014-02-19 05:21:08

by Lei Wen

[permalink] [raw]
Subject: [PATCH] sched: keep quiescent cpu out of idle balance loop

Since cpu which is put into quiescent mode, would remove itself
from kernel's sched_domain. So we could use search sched_domain
method to check whether this cpu don't want to be disturbed as
idle load balance would send IPI to it.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 235cfa7..14230ae 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6783,6 +6783,8 @@ out_unlock:
* - When one of the busy CPUs notice that there may be an idle rebalancing
* needed, they will kick the idle load balancer, which then does idle
* load balancing for all the idle CPUs.
+ * - exclude those cpus not inside current call_cpu's sched_domain, so that
+ * those isolated cpu could be kept in their quisecnt mode.
*/
static struct {
cpumask_var_t idle_cpus_mask;
@@ -6792,10 +6794,16 @@ static struct {

static inline int find_new_ilb(void)
{
- int ilb = cpumask_first(nohz.idle_cpus_mask);
+ int ilb;
+ int cpu = smp_processor_id();
+ struct sched_domain *tmp;

- if (ilb < nr_cpu_ids && idle_cpu(ilb))
- return ilb;
+ for_each_domain(cpu, tmp) {
+ ilb = cpumask_first_and(nohz.idle_cpus_mask,
+ sched_domain_span(tmp));
+ if (ilb < nr_cpu_ids && idle_cpu(ilb))
+ return ilb;
+ }

return nr_cpu_ids;
}
--
1.8.3.2


2014-02-19 09:04:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: keep quiescent cpu out of idle balance loop

On Wed, Feb 19, 2014 at 01:20:30PM +0800, Lei Wen wrote:
> Since cpu which is put into quiescent mode, would remove itself
> from kernel's sched_domain. So we could use search sched_domain
> method to check whether this cpu don't want to be disturbed as
> idle load balance would send IPI to it.
>
> Signed-off-by: Lei Wen <[email protected]>
> ---
> kernel/sched/fair.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 235cfa7..14230ae 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6783,6 +6783,8 @@ out_unlock:
> * - When one of the busy CPUs notice that there may be an idle rebalancing
> * needed, they will kick the idle load balancer, which then does idle
> * load balancing for all the idle CPUs.
> + * - exclude those cpus not inside current call_cpu's sched_domain, so that
> + * those isolated cpu could be kept in their quisecnt mode.
> */
> static struct {
> cpumask_var_t idle_cpus_mask;
> @@ -6792,10 +6794,16 @@ static struct {
>
> static inline int find_new_ilb(void)
> {
> - int ilb = cpumask_first(nohz.idle_cpus_mask);
> + int ilb;
> + int cpu = smp_processor_id();
> + struct sched_domain *tmp;
>
> - if (ilb < nr_cpu_ids && idle_cpu(ilb))
> - return ilb;
> + for_each_domain(cpu, tmp) {
> + ilb = cpumask_first_and(nohz.idle_cpus_mask,
> + sched_domain_span(tmp));
> + if (ilb < nr_cpu_ids && idle_cpu(ilb))
> + return ilb;
> + }

The ILB code is bad; but you just made it horrible. Don't add pointless
for_each_domain() iterations.

I'm thinking something like:

ilb = cpumask_first_and(nohz.idle_cpus_mask, this_rq()->rd.span);

Should work just fine, no?

Better still would be to maybe not participate in the ILB in the first
place and leave this selection loop alone.

2014-02-20 02:42:53

by Lei Wen

[permalink] [raw]
Subject: Re: [PATCH] sched: keep quiescent cpu out of idle balance loop

On Wed, Feb 19, 2014 at 5:04 PM, Peter Zijlstra <[email protected]> wrote:
> On Wed, Feb 19, 2014 at 01:20:30PM +0800, Lei Wen wrote:
>> Since cpu which is put into quiescent mode, would remove itself
>> from kernel's sched_domain. So we could use search sched_domain
>> method to check whether this cpu don't want to be disturbed as
>> idle load balance would send IPI to it.
>>
>> Signed-off-by: Lei Wen <[email protected]>
>> ---
>> kernel/sched/fair.c | 14 +++++++++++---
>> 1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 235cfa7..14230ae 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -6783,6 +6783,8 @@ out_unlock:
>> * - When one of the busy CPUs notice that there may be an idle rebalancing
>> * needed, they will kick the idle load balancer, which then does idle
>> * load balancing for all the idle CPUs.
>> + * - exclude those cpus not inside current call_cpu's sched_domain, so that
>> + * those isolated cpu could be kept in their quisecnt mode.
>> */
>> static struct {
>> cpumask_var_t idle_cpus_mask;
>> @@ -6792,10 +6794,16 @@ static struct {
>>
>> static inline int find_new_ilb(void)
>> {
>> - int ilb = cpumask_first(nohz.idle_cpus_mask);
>> + int ilb;
>> + int cpu = smp_processor_id();
>> + struct sched_domain *tmp;
>>
>> - if (ilb < nr_cpu_ids && idle_cpu(ilb))
>> - return ilb;
>> + for_each_domain(cpu, tmp) {
>> + ilb = cpumask_first_and(nohz.idle_cpus_mask,
>> + sched_domain_span(tmp));
>> + if (ilb < nr_cpu_ids && idle_cpu(ilb))
>> + return ilb;
>> + }
>
> The ILB code is bad; but you just made it horrible. Don't add pointless
> for_each_domain() iterations.
>
> I'm thinking something like:
>
> ilb = cpumask_first_and(nohz.idle_cpus_mask, this_rq()->rd.span);
>
> Should work just fine, no?

Yes, it has the same result as my previous patch did.

>
> Better still would be to maybe not participate in the ILB in the first
> place and leave this selection loop alone.

Not quitely get your point here...
Do you mean that you want idle cpu selection be put in earlier place
than current find_new_ilb is?

Thanks,
Lei

2014-02-20 08:50:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: keep quiescent cpu out of idle balance loop

On Thu, Feb 20, 2014 at 10:42:51AM +0800, Lei Wen wrote:
> >> - int ilb = cpumask_first(nohz.idle_cpus_mask);
> >> + int ilb;
> >> + int cpu = smp_processor_id();
> >> + struct sched_domain *tmp;
> >>
> >> - if (ilb < nr_cpu_ids && idle_cpu(ilb))
> >> - return ilb;
> >> + for_each_domain(cpu, tmp) {
> >> + ilb = cpumask_first_and(nohz.idle_cpus_mask,
> >> + sched_domain_span(tmp));
> >> + if (ilb < nr_cpu_ids && idle_cpu(ilb))
> >> + return ilb;
> >> + }
> >
> > The ILB code is bad; but you just made it horrible. Don't add pointless
> > for_each_domain() iterations.
> >
> > I'm thinking something like:
> >
> > ilb = cpumask_first_and(nohz.idle_cpus_mask, this_rq()->rd.span);
> >
> > Should work just fine, no?
>
> Yes, it has the same result as my previous patch did.
>
> >
> > Better still would be to maybe not participate in the ILB in the first
> > place and leave this selection loop alone.
>
> Not quitely get your point here...
> Do you mean that you want idle cpu selection be put in earlier place
> than current find_new_ilb is?

I meant that if you stop an idle CPU setting its bit in
nohz.idle_cpus_mask, you don't have to mask it out either.

2014-02-20 09:15:35

by Lei Wen

[permalink] [raw]
Subject: Re: [PATCH] sched: keep quiescent cpu out of idle balance loop

On Thu, Feb 20, 2014 at 4:50 PM, Peter Zijlstra <[email protected]> wrote:
> On Thu, Feb 20, 2014 at 10:42:51AM +0800, Lei Wen wrote:
>> >> - int ilb = cpumask_first(nohz.idle_cpus_mask);
>> >> + int ilb;
>> >> + int cpu = smp_processor_id();
>> >> + struct sched_domain *tmp;
>> >>
>> >> - if (ilb < nr_cpu_ids && idle_cpu(ilb))
>> >> - return ilb;
>> >> + for_each_domain(cpu, tmp) {
>> >> + ilb = cpumask_first_and(nohz.idle_cpus_mask,
>> >> + sched_domain_span(tmp));
>> >> + if (ilb < nr_cpu_ids && idle_cpu(ilb))
>> >> + return ilb;
>> >> + }
>> >
>> > The ILB code is bad; but you just made it horrible. Don't add pointless
>> > for_each_domain() iterations.
>> >
>> > I'm thinking something like:
>> >
>> > ilb = cpumask_first_and(nohz.idle_cpus_mask, this_rq()->rd.span);
>> >
>> > Should work just fine, no?
>>
>> Yes, it has the same result as my previous patch did.
>>
>> >
>> > Better still would be to maybe not participate in the ILB in the first
>> > place and leave this selection loop alone.
>>
>> Not quitely get your point here...
>> Do you mean that you want idle cpu selection be put in earlier place
>> than current find_new_ilb is?
>
> I meant that if you stop an idle CPU setting its bit in
> nohz.idle_cpus_mask, you don't have to mask it out either.

Understand it.
I would reformat my previous patch according to your suggestion.

Thanks,
Lei

2014-02-20 09:17:42

by Lei Wen

[permalink] [raw]
Subject: [PATCH] sched: keep quiescent cpu out of idle balance loop

Cpu which is put into quiescent mode, would remove itself
from kernel's sched_domain, and want others not disturb its
task running. But current scheduler would not checking whether
that cpu is setting in such mode, and still insist the quiescent
cpu to response the nohz load balance.

Fix it by preventing such cpu set nohz.idle_cpus_mask in the
first place.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 235cfa7..bc85022 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6883,6 +6883,13 @@ void nohz_balance_enter_idle(int cpu)
if (!cpu_active(cpu))
return;

+ /*
+ * If this cpu is kept outside of root domain, we don't bother
+ * to ask it for nohz balance.
+ */
+ if (!cpumask_test_cpu(cpu, this_rq()->rd.span))
+ return;
+
if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
return;

--
1.8.3.2

2014-02-20 12:04:20

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: keep quiescent cpu out of idle balance loop

On Thu, Feb 20, 2014 at 05:17:04PM +0800, Lei Wen wrote:
> Cpu which is put into quiescent mode, would remove itself
> from kernel's sched_domain, and want others not disturb its
> task running. But current scheduler would not checking whether
> that cpu is setting in such mode, and still insist the quiescent
> cpu to response the nohz load balance.
>
> Fix it by preventing such cpu set nohz.idle_cpus_mask in the
> first place.
>
> Signed-off-by: Lei Wen <[email protected]>

Thanks

2014-02-20 12:23:31

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched: keep quiescent cpu out of idle balance loop

On Thu, Feb 20, 2014 at 05:17:04PM +0800, Lei Wen wrote:
> +++ b/kernel/sched/fair.c
> @@ -6883,6 +6883,13 @@ void nohz_balance_enter_idle(int cpu)
> if (!cpu_active(cpu))
> return;
>
> + /*
> + * If this cpu is kept outside of root domain, we don't bother
> + * to ask it for nohz balance.
> + */
> + if (!cpumask_test_cpu(cpu, this_rq()->rd.span))

rd->span

Otherwise the compiler gets upset. Fixed it.

> + return;
> +
> if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> return;
>
> --
> 1.8.3.2
>

2014-02-21 02:23:42

by Lei Wen

[permalink] [raw]
Subject: [PATCH v2] sched: keep quiescent cpu out of idle balance loop

Cpu which is put into quiescent mode, would remove itself
from kernel's sched_domain, and want others not disturb its
task running. But current scheduler would not checking whether
that cpu is setting in such mode, and still insist the quiescent
cpu to response the nohz load balance.

Fix it by preventing such cpu set nohz.idle_cpus_mask in the
first place.

Signed-off-by: Lei Wen <[email protected]>
---
kernel/sched/fair.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 235cfa7..66194fc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6883,6 +6883,13 @@ void nohz_balance_enter_idle(int cpu)
if (!cpu_active(cpu))
return;

+ /*
+ * If this cpu is kept outside of root domain, we don't bother
+ * to ask it for nohz balance.
+ */
+ if (!cpumask_test_cpu(cpu, this_rq()->rd->span))
+ return;
+
if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
return;

--
1.8.3.2

2014-02-21 05:51:52

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH v2] sched: keep quiescent cpu out of idle balance loop

On Fri, 2014-02-21 at 10:23 +0800, Lei Wen wrote:
> Cpu which is put into quiescent mode, would remove itself
> from kernel's sched_domain, and want others not disturb its
> task running. But current scheduler would not checking whether
> that cpu is setting in such mode, and still insist the quiescent
> cpu to response the nohz load balance.

Let's isolate some CPUs.

Setup:
/-----"system" CPU0
\
\---"rtcpus" CPUs1-3

crash> runqueues
PER-CPU DATA TYPE:
struct rq runqueues;
PER-CPU ADDRESSES:
[0]: ffff88022fc12c00
[1]: ffff88022fc92c00
[2]: ffff88022fd12c00
[3]: ffff88022fd92c00
crash> struct rq ffff88022fd92c00 | grep sd
sd = 0x0, <== yup, CPU3 is isolated bit of silicon
crash> struct rq ffff88022fd92c00 | grep rd
rd = 0xffffffff81bffe60 <def_root_domain>,
crash> struct -x root_domain 0xffffffff81bffe60
...
span = {{
bits = {0xe} <== "rtcpus"
}},
crash> struct rq ffff88022fc12c00 | grep rd
rd = 0xffff8802242c5800,
crash> struct -x root_domain 0xffff8802242c5800
...
span = {{
bits = {0x1} <== "system"
}},

Turn off load balancing in "system" as well now, CPU0 loses its 'sd',
and becomes an isolated island identical to "rtcpus" CPUs, and thus..

span = {{
bits = {0xf} <== oh darn
}},

.."system" and "rtcpus" merge, all CPUs having NULL sd, as they are now
all remote silicon islands, but they now also share rd again, as if you
had never diddled domains in the first place.

-Mike

2014-02-21 07:28:50

by Lei Wen

[permalink] [raw]
Subject: Re: [PATCH v2] sched: keep quiescent cpu out of idle balance loop

Mike,

On Fri, Feb 21, 2014 at 1:51 PM, Mike Galbraith <[email protected]> wrote:
> On Fri, 2014-02-21 at 10:23 +0800, Lei Wen wrote:
>> Cpu which is put into quiescent mode, would remove itself
>> from kernel's sched_domain, and want others not disturb its
>> task running. But current scheduler would not checking whether
>> that cpu is setting in such mode, and still insist the quiescent
>> cpu to response the nohz load balance.
>
> Let's isolate some CPUs.
>
> Setup:
> /-----"system" CPU0
> \
> \---"rtcpus" CPUs1-3
>
> crash> runqueues
> PER-CPU DATA TYPE:
> struct rq runqueues;
> PER-CPU ADDRESSES:
> [0]: ffff88022fc12c00
> [1]: ffff88022fc92c00
> [2]: ffff88022fd12c00
> [3]: ffff88022fd92c00
> crash> struct rq ffff88022fd92c00 | grep sd
> sd = 0x0, <== yup, CPU3 is isolated bit of silicon
> crash> struct rq ffff88022fd92c00 | grep rd
> rd = 0xffffffff81bffe60 <def_root_domain>,
> crash> struct -x root_domain 0xffffffff81bffe60
> ...
> span = {{
> bits = {0xe} <== "rtcpus"
> }},
> crash> struct rq ffff88022fc12c00 | grep rd
> rd = 0xffff8802242c5800,
> crash> struct -x root_domain 0xffff8802242c5800
> ...
> span = {{
> bits = {0x1} <== "system"
> }},
>
> Turn off load balancing in "system" as well now, CPU0 loses its 'sd',
> and becomes an isolated island identical to "rtcpus" CPUs, and thus..
>
> span = {{
> bits = {0xf} <== oh darn
> }},
>
> .."system" and "rtcpus" merge, all CPUs having NULL sd, as they are now
> all remote silicon islands, but they now also share rd again, as if you
> had never diddled domains in the first place.

Great catch for it!
Actually, what I have experiment is as:
1. set top cpuset as disable load balance
2. set 0-2 cpus to "system", and enable its load balance
3. set 3 cpu to "rt" and disable load balance.

While by this way, root span always covering [0-2] which is seen
by cpu 0-2, as you also mentioned.
And it is true that if I disable load balance, I would see span mask
get them merged.

So how about below change?
+ if (!this_rq()->sd)
+ return;
Suppose isolated cpu would lose its sd, could you help
confirm it from crash too?

Or, you think it is wrong to do merge job when system group disable
the load balance?

Thanks,
Lei

2014-02-21 08:35:00

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH v2] sched: keep quiescent cpu out of idle balance loop

On Fri, 2014-02-21 at 15:28 +0800, Lei Wen wrote:

> Actually, what I have experiment is as:
> 1. set top cpuset as disable load balance
> 2. set 0-2 cpus to "system", and enable its load balance
> 3. set 3 cpu to "rt" and disable load balance.

Exactly as I do, pertinent part of my cheezy script being...

# ...and fire up the shield
cset shield --userset=rtcpus --cpu=${START_CPU}-${END_CPU} --kthread=on

# If cpuset wasn't previously mounted (no obnoxious systemd),
# we just mounted it. Find the mount point.
if [ -z $CPUSET_ROOT ]; then
CPUSET_ROOT=$(grep cpuset /proc/mounts|cut -d ' ' -f2)
if [ -z $CPUSET_ROOT ]; then
# If it's not mounted now, bail.
echo EEK, cupset is not mounted!
exit
else
# ok, check for cgroup mount
if [ -f ${CPUSET_ROOT}/cpuset.cpus ]; then
CPUSET_PREFIX=cpuset.
fi
fi
fi

echo 0 > ${CPUSET_ROOT}/${CPUSET_PREFIX}sched_load_balance
echo 1 > ${CPUSET_ROOT}/system/${CPUSET_PREFIX}sched_load_balance
echo 0 > ${CPUSET_ROOT}/rtcpus/${CPUSET_PREFIX}sched_load_balance
echo 0 > ${CPUSET_ROOT}/rtcpus/${CPUSET_PREFIX}sched_relax_domain_level

> While by this way, root span always covering [0-2] which is seen
> by cpu 0-2, as you also mentioned.
> And it is true that if I disable load balance, I would see span mask
> get them merged.
>
> So how about below change?
> + if (!this_rq()->sd)
> + return;
> Suppose isolated cpu would lose its sd, could you help
> confirm it from crash too?

Yeah, isolated CPUs have no sd connectivity. I sent Peter a patchlet
offline showing what I do to keep nohz at bay.
> Or, you think it is wrong to do merge job when system group disable
> the load balance?

I think the construction stuff works fine, and !->sd is the perfect cue
to tell various things to keep their grubby mitts off of a CPU.

-Mike

2014-02-21 09:15:41

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH v2] sched: keep quiescent cpu out of idle balance loop

On Fri, 2014-02-21 at 09:34 +0100, Mike Galbraith wrote:

> I think the construction stuff works fine, and !->sd is the perfect cue
> to tell various things to keep their grubby mitts off of a CPU.

Take idle_balance() for instance.. not much point in dropping rq->lock
just to take it again after doing _nothing_, and we certainly wouldn't
want to go play load balancer for connected CPUs anyway, we might get
something much more important to do RSN.

(that applies to rt in general - go off and play load balancer in the
SCHED_OTHER slums? Surely you jest, elite snobs don't do skut work,
they actually might get their hands dirty;)

Or, at user discretion, telling CPUPRI stuff that no load balancing also
means no rt balancing, i.e. the user assumes responsibility for ALL task
placement, so we don't need to call neutered functions or spend cycles
maintaining data we will have no use for until the user tells us he is
finished with these CPUs.

etc.

-Mike

2014-02-21 09:16:11

by Lei Wen

[permalink] [raw]
Subject: [PATCH v3] sched: keep quiescent cpu out of idle balance loop

Cpu which is put into quiescent mode, would set its sd
member as NULL, and want others not disturb its task running.
But current scheduler would not checking whether that cpu is
setting in such mode, and still insist the quiescent
cpu to response the nohz load balance.

Fix it by preventing such cpu set nohz.idle_cpus_mask in the
first place.

Signed-off-by: Lei Wen <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Mike Galbraith <[email protected]>
---
Much thanks to Mike Pointing out the root span would be merged when the
last cpu becomes isolated from the crash result checking!

kernel/sched/fair.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 235cfa7..af30b6a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6883,6 +6883,14 @@ void nohz_balance_enter_idle(int cpu)
if (!cpu_active(cpu))
return;

+ /*
+ * If this cpu is isolated, its rq's sd member would becomes NULL.
+ * Base on this observation, we could exclude this cpu from nohz
+ * idle balance, so that it would not be disturbed.
+ */
+ if (!this_rq()->sd)
+ return;
+
if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
return;

--
1.8.3.2

2014-02-21 09:42:33

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH v3] sched: keep quiescent cpu out of idle balance loop

On Fri, 2014-02-21 at 17:15 +0800, Lei Wen wrote:

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 235cfa7..af30b6a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6883,6 +6883,14 @@ void nohz_balance_enter_idle(int cpu)
> if (!cpu_active(cpu))
> return;
>
> + /*
> + * If this cpu is isolated, its rq's sd member would becomes NULL.
> + * Base on this observation, we could exclude this cpu from nohz
> + * idle balance, so that it would not be disturbed.
> + */
> + if (!this_rq()->sd)
> + return;
> +
> if (test_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu)))
> return;

What about nohz_balance_exit_idle()?

I think Peter queued a patchlet to tell nohz balancing to go away.

-Mike