2015-04-03 16:24:40

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH 1/2] nohz: add tick_nohz_full_set_cpus() API

From: Chris Metcalf <[email protected]>

This is useful, for example, to modify a cpumask indicating some
special nohz-type functionality so that the nohz cores are
automatically added to that set.

Signed-off-by: Chris Metcalf <[email protected]>
---
include/linux/tick.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index d53ad4892a39..29456c443970 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -192,6 +192,12 @@ static inline void tick_nohz_full_clear_cpus(struct cpumask *mask)
cpumask_andnot(mask, mask, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_set_cpus(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -201,6 +207,7 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_clear_cpus(struct cpumask *mask) { }
+static inline void tick_nohz_full_set_cpus(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.2


2015-04-03 16:24:38

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH 2/2] nohz: make nohz_full imply isolcpus

From: Chris Metcalf <[email protected]>

It's not clear that nohz_full is useful without isolcpus also
set, since otherwise the scheduler has to run periodically to
try to determine whether to steal work from other cores.

Signed-off-by: Chris Metcalf <[email protected]>
---
I am puzzled why this has not been done before, so I suspect
there is some argument against it that I am missing, but I
wasn't able to turn anything up by searching LKML.

kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..275f12c608f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
doms_cur = &fallback_doms;
+ tick_nohz_full_set_cpus(cpu_isolated_map);
cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
err = build_sched_domains(doms_cur[0], NULL);
register_sched_domain_sysctl();
--
2.1.2

2015-04-03 17:43:10

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On Fri, Apr 03, 2015 at 12:24:08PM -0400, [email protected] wrote:
> From: Chris Metcalf <[email protected]>
>
> It's not clear that nohz_full is useful without isolcpus also
> set, since otherwise the scheduler has to run periodically to
> try to determine whether to steal work from other cores.
>
> Signed-off-by: Chris Metcalf <[email protected]>

I think Rick has a similar patch.

> ---
> I am puzzled why this has not been done before, so I suspect
> there is some argument against it that I am missing, but I
> wasn't able to turn anything up by searching LKML.
>
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..275f12c608f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
> doms_cur = alloc_sched_domains(ndoms_cur);
> if (!doms_cur)
> doms_cur = &fallback_doms;
> + tick_nohz_full_set_cpus(cpu_isolated_map);
> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> err = build_sched_domains(doms_cur[0], NULL);
> register_sched_domain_sysctl();
> --
> 2.1.2
>

2015-04-03 18:08:36

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On Fri, 2015-04-03 at 12:24 -0400, [email protected] wrote:
> From: Chris Metcalf <[email protected]>
>
> It's not clear that nohz_full is useful without isolcpus also
> set, since otherwise the scheduler has to run periodically to
> try to determine whether to steal work from other cores.
>
> Signed-off-by: Chris Metcalf <[email protected]>

Ack! nohz_full= as currently defined makes zero sense when the cpu
set (which should be spelled cpuset) remains connected to the
scheduler. Perturbation of tasks to PREVENT cpu domination is what
the scheduler does for a living. Sprinkling microsecond savers all
over the kernel is pretty silly if you don't shut down the mother lode
of perturbation.

> ---
> I am puzzled why this has not been done before, so I suspect
> there is some argument against it that I am missing, but I
> wasn't able to turn anything up by searching LKML.
>
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..275f12c608f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct
> cpumask *cpu_map)
> doms_cur = alloc_sched_domains(ndoms_cur);
> if (!doms_cur)
> doms_cur = &fallback_doms;
> + tick_nohz_full_set_cpus(cpu_isolated_map);
> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> err = build_sched_domains(doms_cur[0], NULL);
> register_sched_domain_sysctl();

2015-04-03 19:20:43

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On 04/03/2015 01:42 PM, Frederic Weisbecker wrote:
> On Fri, Apr 03, 2015 at 12:24:08PM -0400, [email protected] wrote:
>> From: Chris Metcalf <[email protected]>
>>
>> It's not clear that nohz_full is useful without isolcpus also
>> set, since otherwise the scheduler has to run periodically to
>> try to determine whether to steal work from other cores.
>>
>> Signed-off-by: Chris Metcalf <[email protected]>
> I think Rick has a similar patch.

I didn't see anything relevant in linux-next, though I did see
cpu_isolated_map
made into a public symbol in a recent commit by Rik.

Rik, what's the change you're proposing that's similar to this one? Thanks!

>> ---
>> I am puzzled why this has not been done before, so I suspect
>> there is some argument against it that I am missing, but I
>> wasn't able to turn anything up by searching LKML.
>>
>> kernel/sched/core.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f0f831e8a345..275f12c608f2 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
>> doms_cur = alloc_sched_domains(ndoms_cur);
>> if (!doms_cur)
>> doms_cur = &fallback_doms;
>> + tick_nohz_full_set_cpus(cpu_isolated_map);
>> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
>> err = build_sched_domains(doms_cur[0], NULL);
>> register_sched_domain_sysctl();
>> --
>> 2.1.2
>>

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-03 19:21:19

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On 04/03/2015 02:08 PM, Mike Galbraith wrote:
> On Fri, 2015-04-03 at 12:24 -0400, [email protected] wrote:
>> From: Chris Metcalf <[email protected]>
>>
>> It's not clear that nohz_full is useful without isolcpus also
>> set, since otherwise the scheduler has to run periodically to
>> try to determine whether to steal work from other cores.
>>
>> Signed-off-by: Chris Metcalf <[email protected]>
> Ack! nohz_full= as currently defined makes zero sense when the cpu
> set (which should be spelled cpuset) remains connected to the
> scheduler. Perturbation of tasks to PREVENT cpu domination is what
> the scheduler does for a living. Sprinkling microsecond savers all
> over the kernel is pretty silly if you don't shut down the mother lode
> of perturbation.

Sounds like a thumbs up for this patch, then? :-)

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-04 02:04:07

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On Fri, 2015-04-03 at 15:21 -0400, Chris Metcalf wrote:
> On 04/03/2015 02:08 PM, Mike Galbraith wrote:
> > On Fri, 2015-04-03 at 12:24 -0400, [email protected]:
> > > From: Chris Metcalf <[email protected]>
> > >
> > > It's not clear that nohz_full is useful without isolcpus also
> > > set, since otherwise the scheduler has to run periodically to
> > > try to determine whether to steal work from other cores.
> > >
> > > Signed-off-by: Chris Metcalf <[email protected]>
> > Ack! nohz_full= as currently defined makes zero sense when the cpu
> > set (which should be spelled cpuset) remains connected to the
> > scheduler. Perturbation of tasks to PREVENT cpu domination is what
> > the scheduler does for a living. Sprinkling microsecond savers all
> > over the kernel is pretty silly if you don't shut down the mother
> > lode
> > of perturbation.
>
> Sounds like a thumbs up for this patch, then? :-)

Yup. The other thumb turns in the up direction when folks start
spelling cpuset properly ;-) Static isolcpus was supposed to go away.

-Mike

2015-04-04 03:43:16

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On Sat, 2015-04-04 at 04:03 +0200, Mike Galbraith wrote:
> On Fri, 2015-04-03 at 15:21 -0400, Chris Metcalf wrote:
> > On 04/03/2015 02:08 PM, Mike Galbraith wrote:
> > > On Fri, 2015-04-03 at 12:24 -0400, [email protected]:
> > > > From: Chris Metcalf <[email protected]>
> > > >
> > > > It's not clear that nohz_full is useful without isolcpus also
> > > > set, since otherwise the scheduler has to run periodically to
> > > > try to determine whether to steal work from other cores.
> > > >
> > > > Signed-off-by: Chris Metcalf <[email protected]>
> > > Ack! nohz_full= as currently defined makes zero sense when the
> > > cpu
> > > set (which should be spelled cpuset) remains connected to the
> > > scheduler. Perturbation of tasks to PREVENT cpu domination is
> > > what
> > > the scheduler does for a living. Sprinkling microsecond savers
> > > all
> > > over the kernel is pretty silly if you don't shut down the
> > > mother
> > > lode
> > > of perturbation.
> >
> > Sounds like a thumbs up for this patch, then? :-)
>
> Yup. The other thumb turns in the up direction when folks start
> spelling cpuset properly ;-) Static isolcpus was supposed to go
> away.

Speaking of microsecond savers, the (ick) deferment experiment below
cut 60 core jitter in half. Shooting the clocksource watchdog fixes
alternating ~15us/~5us tick on my desktop box.

With workqueue twiddles and whatnot floating around, the thing is
starting to look viable.

---
kernel/sched/core.c | 5 +++--
kernel/time/clocksource.c | 5 +++++
2 files changed, 8 insertions(+), 2 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2604,12 +2604,13 @@ u64 scheduler_tick_max_deferment(void)
struct rq *rq = this_rq();
unsigned long next, now = ACCESS_ONCE(jiffies);

- next = rq->last_sched_tick + HZ;
+ next = (rq->last_sched_tick + HZ) | (rq->clock & 0x3f);

if (time_before_eq(next, now))
return 0;

- return jiffies_to_nsecs(next - now);
+ /* Add noise to avoid CPUs colliding at tick boundaries */
+ return jiffies_to_nsecs(next - now) | (rq->clock & 0xfffff);
}
#endif

--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -267,8 +267,13 @@ static void clocksource_watchdog(unsigne
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
+skip_nohz_full:
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
+ if (next_cpu && tick_nohz_full_cpu(next_cpu)) {
+ next_cpu = cpumask_next(next_cpu, cpu_online_mask);
+ goto skip_nohz_full;
+ }
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);
out:

2015-04-04 14:10:49

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On 04/03/2015 03:20 PM, Chris Metcalf wrote:
> On 04/03/2015 01:42 PM, Frederic Weisbecker wrote:
>> On Fri, Apr 03, 2015 at 12:24:08PM -0400, [email protected] wrote:
>>> From: Chris Metcalf <[email protected]>
>>>
>>> It's not clear that nohz_full is useful without isolcpus also
>>> set, since otherwise the scheduler has to run periodically to
>>> try to determine whether to steal work from other cores.
>>>
>>> Signed-off-by: Chris Metcalf <[email protected]>
>> I think Rick has a similar patch.
>
> I didn't see anything relevant in linux-next, though I did see
> cpu_isolated_map
> made into a public symbol in a recent commit by Rik.

I have a few patches in cgroups/for-4.1 as well that
export information about isolated and nohz_full cpus
in /sys/devices/system/cpu/

> Rik, what's the change you're proposing that's similar to this one? Thanks!

I don't have this particular one, and I like it.

I know there are use cases where isolcpus= without
nohz_full= makes sense, but I cannot think of the
reverse.

Acked-by: Rik van Riel <[email protected]>

--
All rights reversed

2015-04-04 17:09:48

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On 4/4/2015 10:10 AM, Rik van Riel wrote:
>> Rik, what's the change you're proposing that's similar to this one? Thanks!
> I don't have this particular one, and I like it.
>
> I know there are use cases where isolcpus= without
> nohz_full= makes sense, but I cannot think of the
> reverse.
>
> Acked-by: Rik van Riel<[email protected]>

Thanks, I'll push it via the tile tree unless someone would prefer otherwise.
(The tick_nohz_full_set_cpus() and tick_nohz_full_clear_cpus() routines are in
earlier tile tree commits, the latter supporting a change to the tile network driver.)

Mike, I added your

Thumbs-Up-by: Mike Galbraith <[email protected]> :-)

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-05 05:05:48

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus


* Chris Metcalf <[email protected]> wrote:

> On 4/4/2015 10:10 AM, Rik van Riel wrote:
> >>Rik, what's the change you're proposing that's similar to this one? Thanks!
> >I don't have this particular one, and I like it.
> >
> >I know there are use cases where isolcpus= without
> >nohz_full= makes sense, but I cannot think of the
> >reverse.
> >
> >Acked-by: Rik van Riel<[email protected]>
>
> Thanks, I'll push it via the tile tree unless someone would prefer otherwise.

Yes, I'd prefer otherwise: please send the final, agreed upon patch to
the timer tree.

> (The tick_nohz_full_set_cpus() and tick_nohz_full_clear_cpus()
> routines are in earlier tile tree commits, the latter supporting a
> change to the tile network driver.)

This is absolutely not OK, please push this through the timer tree. We
don't do generic timer changes through architecture trees.

Thanks,

Ingo

2015-04-06 17:15:42

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On 04/05/2015 01:05 AM, Ingo Molnar wrote:
> * Chris Metcalf <[email protected]> wrote:
>
>> On 4/4/2015 10:10 AM, Rik van Riel wrote:
>>>> Rik, what's the change you're proposing that's similar to this one? Thanks!
>>> I don't have this particular one, and I like it.
>>>
>>> I know there are use cases where isolcpus= without
>>> nohz_full= makes sense, but I cannot think of the
>>> reverse.
>>>
>>> Acked-by: Rik van Riel<[email protected]>
>> Thanks, I'll push it via the tile tree unless someone would prefer otherwise.
> Yes, I'd prefer otherwise: please send the final, agreed upon patch to
> the timer tree.

Not having done this before, I assume I just ask you to take patches
from LKML, or to pull from my tree?

I've set up a "timers" branch in my kernel.org git repo in any case
(git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git)
that has the commits in it.

>> (The tick_nohz_full_set_cpus() and tick_nohz_full_clear_cpus()
>> routines are in earlier tile tree commits, the latter supporting a
>> change to the tile network driver.)
> This is absolutely not OK, please push this through the timer tree. We
> don't do generic timer changes through architecture trees.

Sure; makes sense. I will post a v2 with two patches in the patchset
and see if anyone else wants to comment in the next couple of days,
and then ask you to pull, or take the patch series, at that point.

Thanks!

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-06 18:17:14

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 1/2] nohz: add tick_nohz_full_clear_cpus() and _set_cpus() APIs

From: Chris Metcalf <[email protected]>

The "clear" API is useful, for example, to modify a cpumask to avoid
the nohz cores so that interrupts aren't sent to them.

Likewise the "set" API is useful to modify a cpumask indicating some
special nohz-type functionality so that the nohz cores are
automatically added to that set.

Signed-off-by: Chris Metcalf <[email protected]>
---
Frederic, if you could ack this patch (and maybe the next) before
I ask for it to be pulled into the timer tree that would be great.
I will wait a few days before asking in case anyone else has any
other issues or would like to provide an Acked-by.

v2: put the "...set_cpus" API together with the change to
init_sched_domains() so that they can go into the timer tree
independently of other changes in my tree.

include/linux/tick.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9c085dc12ae9..29456c443970 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_clear_cpus(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_andnot(mask, mask, tick_nohz_full_mask);
+}
+
+static inline void tick_nohz_full_set_cpus(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_clear_cpus(struct cpumask *mask) { }
+static inline void tick_nohz_full_set_cpus(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.2

2015-04-06 18:17:26

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

From: Chris Metcalf <[email protected]>

It's not clear that nohz_full is useful without isolcpus also
set, since otherwise the scheduler has to run periodically to
try to determine whether to steal work from other cores.

Signed-off-by: Chris Metcalf <[email protected]>
Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
Acked-by: Rik van Riel <[email protected]>
---
kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..275f12c608f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
doms_cur = &fallback_doms;
+ tick_nohz_full_set_cpus(cpu_isolated_map);
cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
err = build_sched_domains(doms_cur[0], NULL);
register_sched_domain_sysctl();
--
2.1.2

2015-04-06 18:29:48

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nohz: add tick_nohz_full_clear_cpus() and _set_cpus() APIs

On Mon, Apr 06, 2015 at 02:16:44PM -0400, [email protected] wrote:
> From: Chris Metcalf <[email protected]>
>
> The "clear" API is useful, for example, to modify a cpumask to avoid
> the nohz cores so that interrupts aren't sent to them.
>
> Likewise the "set" API is useful to modify a cpumask indicating some
> special nohz-type functionality so that the nohz cores are
> automatically added to that set.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> Frederic, if you could ack this patch (and maybe the next) before
> I ask for it to be pulled into the timer tree that would be great.
> I will wait a few days before asking in case anyone else has any
> other issues or would like to provide an Acked-by.

Sure, or better yet I should take them since they are full nohz code.
And I have a pending pile to pull-request to timer tree.

Thanks.

>
> v2: put the "...set_cpus" API together with the change to
> init_sched_domains() so that they can go into the timer tree
> independently of other changes in my tree.
>
> include/linux/tick.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 9c085dc12ae9..29456c443970 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
> return cpumask_test_cpu(cpu, tick_nohz_full_mask);
> }
>
> +static inline void tick_nohz_full_clear_cpus(struct cpumask *mask)
> +{
> + if (tick_nohz_full_enabled())
> + cpumask_andnot(mask, mask, tick_nohz_full_mask);
> +}
> +
> +static inline void tick_nohz_full_set_cpus(struct cpumask *mask)
> +{
> + if (tick_nohz_full_enabled())
> + cpumask_or(mask, mask, tick_nohz_full_mask);
> +}
> +
> extern void __tick_nohz_full_check(void);
> extern void tick_nohz_full_kick(void);
> extern void tick_nohz_full_kick_cpu(int cpu);
> @@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
> #else
> static inline bool tick_nohz_full_enabled(void) { return false; }
> static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> +static inline void tick_nohz_full_clear_cpus(struct cpumask *mask) { }
> +static inline void tick_nohz_full_set_cpus(struct cpumask *mask) { }
> static inline void __tick_nohz_full_check(void) { }
> static inline void tick_nohz_full_kick_cpu(int cpu) { }
> static inline void tick_nohz_full_kick(void) { }
> --
> 2.1.2
>

2015-04-06 19:10:13

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nohz: add tick_nohz_full_clear_cpus() and _set_cpus() APIs

On 04/06/2015 02:29 PM, Frederic Weisbecker wrote:
> On Mon, Apr 06, 2015 at 02:16:44PM -0400,[email protected] wrote:
>> >From: Chris Metcalf<[email protected]>
>> >
>> >The "clear" API is useful, for example, to modify a cpumask to avoid
>> >the nohz cores so that interrupts aren't sent to them.
>> >
>> >Likewise the "set" API is useful to modify a cpumask indicating some
>> >special nohz-type functionality so that the nohz cores are
>> >automatically added to that set.
>> >
>> >Signed-off-by: Chris Metcalf<[email protected]>
>> >---
>> >Frederic, if you could ack this patch (and maybe the next) before
>> >I ask for it to be pulled into the timer tree that would be great.
>> >I will wait a few days before asking in case anyone else has any
>> >other issues or would like to provide an Acked-by.
> Sure, or better yet I should take them since they are full nohz code.
> And I have a pending pile to pull-request to timer tree.

That would be great, thanks.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-06 19:28:32

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On 04/03/2015 11:43 PM, Mike Galbraith wrote:

> Speaking of microsecond savers, the (ick) deferment experiment below
> cut 60 core jitter in half. Shooting the clocksource watchdog fixes
> alternating ~15us/~5us tick on my desktop box.
>
> With workqueue twiddles and whatnot floating around, the thing is
> starting to look viable.

Doesn't look too bad to me, though the changes below
could probably use some comments when turned into a
final patch :)

> ---
> kernel/sched/core.c | 5 +++--
> kernel/time/clocksource.c | 5 +++++
> 2 files changed, 8 insertions(+), 2 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2604,12 +2604,13 @@ u64 scheduler_tick_max_deferment(void)
> struct rq *rq = this_rq();
> unsigned long next, now = ACCESS_ONCE(jiffies);
>
> - next = rq->last_sched_tick + HZ;
> + next = (rq->last_sched_tick + HZ) | (rq->clock & 0x3f);
>
> if (time_before_eq(next, now))
> return 0;
>
> - return jiffies_to_nsecs(next - now);
> + /* Add noise to avoid CPUs colliding at tick boundaries */
> + return jiffies_to_nsecs(next - now) | (rq->clock & 0xfffff);
> }
> #endif
>
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -267,8 +267,13 @@ static void clocksource_watchdog(unsigne
> * to each other.
> */
> next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
> +skip_nohz_full:
> if (next_cpu >= nr_cpu_ids)
> next_cpu = cpumask_first(cpu_online_mask);
> + if (next_cpu && tick_nohz_full_cpu(next_cpu)) {
> + next_cpu = cpumask_next(next_cpu, cpu_online_mask);
> + goto skip_nohz_full;
> + }
> watchdog_timer.expires += WATCHDOG_INTERVAL;
> add_timer_on(&watchdog_timer, next_cpu);
> out:
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2015-04-07 03:10:28

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH 2/2] nohz: make nohz_full imply isolcpus

On Mon, 2015-04-06 at 15:28 -0400, Rik van Riel wrote:
> On 04/03/2015 11:43 PM, Mike Galbraith wrote:
>
> > Speaking of microsecond savers, the (ick) deferment experiment
> > below
> > cut 60 core jitter in half. Shooting the clocksource watchdog
> > fixes
> > alternating ~15us/~5us tick on my desktop box.
> >
> > With workqueue twiddles and whatnot floating around, the thing is
> > starting to look viable.
>
> Doesn't look too bad to me, though the changes below
> could probably use some comments when turned into a
> final patch :)

The watchdog yeah, the tick thing may want to become.. anything else.

-Mike

2015-04-07 09:33:52

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nohz: add tick_nohz_full_clear_cpus() and _set_cpus() APIs


* Chris Metcalf <[email protected]> wrote:

> On 04/06/2015 02:29 PM, Frederic Weisbecker wrote:
> >On Mon, Apr 06, 2015 at 02:16:44PM -0400,[email protected] wrote:
> >>>From: Chris Metcalf<[email protected]>
> >>>
> >>>The "clear" API is useful, for example, to modify a cpumask to avoid
> >>>the nohz cores so that interrupts aren't sent to them.
> >>>
> >>>Likewise the "set" API is useful to modify a cpumask indicating some
> >>>special nohz-type functionality so that the nohz cores are
> >>>automatically added to that set.
> >>>
> >>>Signed-off-by: Chris Metcalf<[email protected]>
> >>>---
> >>>Frederic, if you could ack this patch (and maybe the next) before
> >>>I ask for it to be pulled into the timer tree that would be great.
> >>>I will wait a few days before asking in case anyone else has any
> >>>other issues or would like to provide an Acked-by.
> >Sure, or better yet I should take them since they are full nohz code.
> >And I have a pending pile to pull-request to timer tree.
>
> That would be great, thanks.

That's fine with me too!

My main worry was not about the specific patches but about the route
they take: it gets better review and better response to future bugs if
it goes through the usual maintainer tree (Frederic's).

Thanks,

Ingo

2015-04-07 22:29:31

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Mon, Apr 06, 2015 at 02:16:45PM -0400, [email protected] wrote:
> From: Chris Metcalf <[email protected]>
>
> It's not clear that nohz_full is useful without isolcpus also
> set, since otherwise the scheduler has to run periodically to
> try to determine whether to steal work from other cores.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
> Acked-by: Rik van Riel <[email protected]>

Peter, are you ok with that change? I think that you planned to remove
cpu_isolated_map a while ago, so I prefer to wait for your ack.

Thanks.

> ---
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..275f12c608f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
> doms_cur = alloc_sched_domains(ndoms_cur);
> if (!doms_cur)
> doms_cur = &fallback_doms;
> + tick_nohz_full_set_cpus(cpu_isolated_map);
> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> err = build_sched_domains(doms_cur[0], NULL);
> register_sched_domain_sysctl();
> --
> 2.1.2
>

2015-04-08 09:41:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Mon, Apr 06, 2015 at 02:16:45PM -0400, [email protected] wrote:
> From: Chris Metcalf <[email protected]>
>
> It's not clear that nohz_full is useful without isolcpus also
> set, since otherwise the scheduler has to run periodically to
> try to determine whether to steal work from other cores.

So the Changelog and the patch don't seem to agree with one another.

The Changelog states that nohz_full should depend on isolcpus.
The patch implies nohz_full for isolcpus.

These are not the same; and I don't see the argument for the former make
sense for the latter.

In specific isolcpus without nohz_full does make sense.

> Signed-off-by: Chris Metcalf <[email protected]>
> Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
> Acked-by: Rik van Riel <[email protected]>
> ---
> kernel/sched/core.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..275f12c608f2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
> doms_cur = alloc_sched_domains(ndoms_cur);
> if (!doms_cur)
> doms_cur = &fallback_doms;
> + tick_nohz_full_set_cpus(cpu_isolated_map);
> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
> err = build_sched_domains(doms_cur[0], NULL);
> register_sched_domain_sysctl();

2015-04-08 14:05:12

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On 04/08/2015 05:41 AM, Peter Zijlstra wrote:
> On Mon, Apr 06, 2015 at 02:16:45PM -0400, [email protected] wrote:
>> From: Chris Metcalf <[email protected]>
>>
>> It's not clear that nohz_full is useful without isolcpus also
>> set, since otherwise the scheduler has to run periodically to
>> try to determine whether to steal work from other cores.
> So the Changelog and the patch don't seem to agree with one another.
>
> The Changelog states that nohz_full should depend on isolcpus.

The git commit message says "make nohz_full imply isolcpus".
That's consistent with the code.

> The patch implies nohz_full for isolcpus.
>
> These are not the same; and I don't see the argument for the former make
> sense for the latter.
>
> In specific isolcpus without nohz_full does make sense.
>
>> Signed-off-by: Chris Metcalf <[email protected]>
>> Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
>> Acked-by: Rik van Riel <[email protected]>
>> ---
>> kernel/sched/core.c | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index f0f831e8a345..275f12c608f2 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
>> doms_cur = alloc_sched_domains(ndoms_cur);
>> if (!doms_cur)
>> doms_cur = &fallback_doms;
>> + tick_nohz_full_set_cpus(cpu_isolated_map);
>> cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
>> err = build_sched_domains(doms_cur[0], NULL);
>> register_sched_domain_sysctl();

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-08 14:26:57

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Wed, Apr 08, 2015 at 10:04:43AM -0400, Chris Metcalf wrote:
> On 04/08/2015 05:41 AM, Peter Zijlstra wrote:
> >On Mon, Apr 06, 2015 at 02:16:45PM -0400, [email protected] wrote:
> >>From: Chris Metcalf <[email protected]>
> >>
> >>It's not clear that nohz_full is useful without isolcpus also
> >>set, since otherwise the scheduler has to run periodically to
> >>try to determine whether to steal work from other cores.
> >So the Changelog and the patch don't seem to agree with one another.
> >
> >The Changelog states that nohz_full should depend on isolcpus.
>
> The git commit message says "make nohz_full imply isolcpus".
> That's consistent with the code.

Well, but then the Changelog doesn't make any sense.

2015-04-08 15:22:13

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On 04/08/2015 10:26 AM, Peter Zijlstra wrote:
> On Wed, Apr 08, 2015 at 10:04:43AM -0400, Chris Metcalf wrote:
>> On 04/08/2015 05:41 AM, Peter Zijlstra wrote:
>>> On Mon, Apr 06, 2015 at 02:16:45PM -0400, [email protected] wrote:
>>>> From: Chris Metcalf <[email protected]>
>>>>
>>>> It's not clear that nohz_full is useful without isolcpus also
>>>> set, since otherwise the scheduler has to run periodically to
>>>> try to determine whether to steal work from other cores.
>>> So the Changelog and the patch don't seem to agree with one another.
>>>
>>> The Changelog states that nohz_full should depend on isolcpus.
>> The git commit message says "make nohz_full imply isolcpus".
>> That's consistent with the code.
> Well, but then the Changelog doesn't make any sense.

Apparently the body of the commit message isn't as clear as it might be :-)

It does say the same thing, though, basically that if nohz_full DOESN'T
imply isolcpus, that's a bad thing. I'm happy to reword the text to avoid
the double negative and say:

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Frederic, do you want me to respin the patch, or can you just update
the text of the commit message as above?

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-08 17:24:35

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Wed, Apr 08, 2015 at 11:21:56AM -0400, Chris Metcalf wrote:
> On 04/08/2015 10:26 AM, Peter Zijlstra wrote:
> >On Wed, Apr 08, 2015 at 10:04:43AM -0400, Chris Metcalf wrote:
> >>On 04/08/2015 05:41 AM, Peter Zijlstra wrote:
> >>>On Mon, Apr 06, 2015 at 02:16:45PM -0400, [email protected] wrote:
> >>>>From: Chris Metcalf <[email protected]>
> >>>>
> >>>>It's not clear that nohz_full is useful without isolcpus also
> >>>>set, since otherwise the scheduler has to run periodically to
> >>>>try to determine whether to steal work from other cores.
> >>>So the Changelog and the patch don't seem to agree with one another.
> >>>
> >>>The Changelog states that nohz_full should depend on isolcpus.
> >>The git commit message says "make nohz_full imply isolcpus".
> >>That's consistent with the code.
> >Well, but then the Changelog doesn't make any sense.
>
> Apparently the body of the commit message isn't as clear as it might be :-)
>
> It does say the same thing, though, basically that if nohz_full DOESN'T
> imply isolcpus, that's a bad thing. I'm happy to reword the text to avoid
> the double negative and say:
>
> nohz_full is only useful with isolcpus also set, since otherwise the
> scheduler has to run periodically to try to determine whether to steal
> work from other cores.
>
> Frederic, do you want me to respin the patch, or can you just update
> the text of the commit message as above?

Please respin. Writing or fixing changelogs is what takes me most time :-)

2015-04-08 17:27:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Wed, Apr 08, 2015 at 11:21:56AM -0400, Chris Metcalf wrote:
> Apparently the body of the commit message isn't as clear as it might be :-)
>
> It does say the same thing, though, basically that if nohz_full DOESN'T
> imply isolcpus, that's a bad thing. I'm happy to reword the text to avoid
> the double negative and say:
>
> nohz_full is only useful with isolcpus also set, since otherwise the
> scheduler has to run periodically to try to determine whether to steal
> work from other cores.

But you're doing the reverse! You're setting nohz_full for isolcpus, not
limiting the nohz_full mask to isolcpus.

2015-04-08 18:13:00

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On 04/08/2015 01:27 PM, Peter Zijlstra wrote:
> On Wed, Apr 08, 2015 at 11:21:56AM -0400, Chris Metcalf wrote:
>> Apparently the body of the commit message isn't as clear as it might be :-)
>>
>> It does say the same thing, though, basically that if nohz_full DOESN'T
>> imply isolcpus, that's a bad thing. I'm happy to reword the text to avoid
>> the double negative and say:
>>
>> nohz_full is only useful with isolcpus also set, since otherwise the
>> scheduler has to run periodically to try to determine whether to steal
>> work from other cores.
> But you're doing the reverse! You're setting nohz_full for isolcpus, not
> limiting the nohz_full mask to isolcpus.

Ah, I see. Yes, that's right. The idea is that if you are saying
"nohz_full=1-15" on the command line, you would like that to
imply "isolcpus=1-15" as well, without having to actually say so
explicitly. If we instead limit nohz_full based on isolcpus, it's not
clear that it's actually worth making any change in this area.

I still maintain that the text has always correctly (if perhaps
confusingly) said what it is that the code was doing; where the
text says "x implies y", that means "x being set forces y to be set".
But I'm respinning it anyway for Frederic so I will avoid using the
word "imply" altogether to make this clearer.

The larger question is whether you agree with the proposed semantics.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-08 19:20:46

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 1/2] nohz: add tick_nohz_full_clear_cpus() and _set_cpus() APIs

From: Chris Metcalf <[email protected]>

The "clear" API is useful, for example, to modify a cpumask to avoid
the nohz cores so that interrupts aren't sent to them.

Likewise the "set" API is useful to modify a cpumask indicating some
special nohz-type functionality so that the nohz cores are
automatically added to that set.

Signed-off-by: Chris Metcalf <[email protected]>
---
v3: no change here, just in part 2/2.

v2: put the "...set_cpus" API together with the change to
init_sched_domains() so that they can go into the timer tree
independently of other changes in my tree.

include/linux/tick.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9c085dc12ae9..29456c443970 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_clear_cpus(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_andnot(mask, mask, tick_nohz_full_mask);
+}
+
+static inline void tick_nohz_full_set_cpus(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_clear_cpus(struct cpumask *mask) { }
+static inline void tick_nohz_full_set_cpus(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.2

2015-04-08 19:20:56

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 2/2] nohz: set isolcpus when nohz_full is set

From: Chris Metcalf <[email protected]>

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Signed-off-by: Chris Metcalf <[email protected]>
Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
Acked-by: Rik van Riel <[email protected]>
---
v3: update changelog language to avoid using unclear "implies" [PeterZ]

kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..275f12c608f2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
doms_cur = &fallback_doms;
+ tick_nohz_full_set_cpus(cpu_isolated_map);
cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
err = build_sched_domains(doms_cur[0], NULL);
register_sched_domain_sysctl();
--
2.1.2

2015-04-09 08:29:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Wed, Apr 08, 2015 at 02:12:34PM -0400, Chris Metcalf wrote:

> >But you're doing the reverse! You're setting nohz_full for isolcpus, not
> >limiting the nohz_full mask to isolcpus.
>
> Ah, I see. Yes, that's right.

No its not, you should correct me when I'm wrong ;-)

So the problem is that:

+ tick_nohz_full_set_cpus(cpu_isolated_map);

reads like you're doing:

nohz_full_map |= isolcpus_map

But in actual fact you're doing:

isolcpus_map |= nohz_full_map

So that function is retarded, but the logic is fine.

So NAK on both patches for the reason that they're utterly confusing as
to wtf they actually do.

2015-04-09 12:02:44

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On 4/9/2015 4:29 AM, Peter Zijlstra wrote:
> On Wed, Apr 08, 2015 at 02:12:34PM -0400, Chris Metcalf wrote:
>
>>> But you're doing the reverse! You're setting nohz_full for isolcpus, not
>>> limiting the nohz_full mask to isolcpus.
>> Ah, I see. Yes, that's right.
> No its not, you should correct me when I'm wrong ;-)

Oh, I have no problem with that :-) But, now that I know what was confusing
you about the patch I see what it was you were saying with your English
above too. I thought you were saying something like "making nohz_full
imply isolcpus" again, but you weren't. Phew, OK, I think we're done
talking at cross-purposes.

> So the problem is that:
>
> + tick_nohz_full_set_cpus(cpu_isolated_map);
>
> reads like you're doing:
>
> nohz_full_map |= isolcpus_map
>
> But in actual fact you're doing:
>
> isolcpus_map |= nohz_full_map
>
> So that function is retarded, but the logic is fine.
>
> So NAK on both patches for the reason that they're utterly confusing as
> to wtf they actually do.

What about tick_nohz_full_cpumask_or(cpu_isolated_map) ?
At that point maybe the similarity to the existing cpumask API will make
it more clear that we are modifying the argument?

If not, do you have any suggestions what might do better? Obviously
the goal is to make it something that macroizes away, otherwise I'd
suggest just explicitly using an #ifdef and cpumask_or().

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-09 12:45:30

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] nohz: make nohz_full imply isolcpus

On Thu, Apr 09, 2015 at 08:02:28AM -0400, Chris Metcalf wrote:
> On 4/9/2015 4:29 AM, Peter Zijlstra wrote:
> >On Wed, Apr 08, 2015 at 02:12:34PM -0400, Chris Metcalf wrote:
> >
> >>>But you're doing the reverse! You're setting nohz_full for isolcpus, not
> >>>limiting the nohz_full mask to isolcpus.
> >>Ah, I see. Yes, that's right.
> >No its not, you should correct me when I'm wrong ;-)
>
> Oh, I have no problem with that :-) But, now that I know what was confusing
> you about the patch I see what it was you were saying with your English
> above too. I thought you were saying something like "making nohz_full
> imply isolcpus" again, but you weren't. Phew, OK, I think we're done
> talking at cross-purposes.
>
> >So the problem is that:
> >
> >+ tick_nohz_full_set_cpus(cpu_isolated_map);
> >
> >reads like you're doing:
> >
> > nohz_full_map |= isolcpus_map
> >
> >But in actual fact you're doing:
> >
> > isolcpus_map |= nohz_full_map
> >
> >So that function is retarded, but the logic is fine.
> >
> >So NAK on both patches for the reason that they're utterly confusing as
> >to wtf they actually do.
>
> What about tick_nohz_full_cpumask_or(cpu_isolated_map) ?
> At that point maybe the similarity to the existing cpumask API will make
> it more clear that we are modifying the argument?
>
> If not, do you have any suggestions what might do better? Obviously
> the goal is to make it something that macroizes away, otherwise I'd
> suggest just explicitly using an #ifdef and cpumask_or().

Possible alternative: do it the other way around.

cpu_isolated_map is allocated and filled early (__setup or sched_init())
before tick_init() and tick_init() is before sched_init_smp() which first uses
cpu_isolated_map(). So we can call some sched_isolated_map_add(struct cpumask *cpumask)
from tick_nohz_init().

2015-04-09 17:00:03

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v5] nohz: set isolcpus when nohz_full is set

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Signed-off-by: Chris Metcalf <[email protected]>
---
Frederic wrote:
> cpu_isolated_map is allocated and filled early (__setup or sched_init())
> before tick_init() and tick_init() is before sched_init_smp() which first uses
> cpu_isolated_map(). So we can call some sched_isolated_map_add(struct cpumask *cpumask)
> from tick_nohz_init().

I'll re-send a v4 of the patch without your suggestion, just renaming
the methods to tick_nohz_full_cpumask_andnot() etc, since I still think
that that model is easier to understand - we tweak isolcpus in exactly
the spot where we first put it to use. And, we do need those
tick_nohz_full_cpumask_xxx() accessors in other places anyway --
see my earlier patch for the tilegx network driver to remove the
nohz_full cores from the set of cores that get interrupted by the driver,
for example.

That said, I'm not opposed to your idea, and we could certainly do it that
way if that's the consensus. For reference, here's what it looks like
when fleshed out; I'm calling it v5 to be sort of clear about this,
but either v4 or v5 would be fine. I left the sched_isolated_map_add()
function enabled in all kernel configurations, not just NO_HZ_FULL,
since it's pretty trivial and it felt like the #ifdefs to disable it
conditionally would be noisier than the benefit to kernel size.

include/linux/sched.h | 1 +
kernel/sched/core.c | 5 +++++
kernel/time/tick-sched.c | 3 +++
3 files changed, 9 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6d77432e14ff..18a961b9beba 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -323,6 +323,7 @@ struct task_struct;
extern int lockdep_tasklist_lock_is_held(void);
#endif /* #ifdef CONFIG_PROVE_RCU */

+extern void sched_isolated_map_add(const struct cpumask *);
extern void sched_init(void);
extern void sched_init_smp(void);
extern asmlinkage void schedule_tail(struct task_struct *prev);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..b055c5e0e65c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5824,6 +5824,11 @@ static int __init isolated_cpu_setup(char *str)

__setup("isolcpus=", isolated_cpu_setup);

+void sched_isolated_map_add(const struct cpumask *cpumask)
+{
+ cpumask_or(cpu_isolated_map, cpu_isolated_map, cpumask);
+}
+
struct s_data {
struct sched_domain ** __percpu sd;
struct root_domain *rd;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index a4c4edac4528..b0092d02ca3f 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -385,6 +385,9 @@ void __init tick_nohz_init(void)
for_each_cpu(cpu, tick_nohz_full_mask)
context_tracking_cpu_set(cpu);

+ /* It's not meaningful to be nohz without disabling the scheduler. */
+ sched_isolated_map_add(tick_nohz_full_mask);
+
cpu_notifier(tick_nohz_cpu_down_callback, 0);
pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
cpumask_pr_args(tick_nohz_full_mask));
--
2.1.2

2015-04-09 17:00:47

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 1/2] nohz: add tick_nohz_full_cpumask_or() and _andnot() APIs

The "andnot" API is useful, for example, to modify a cpumask to avoid
the nohz cores so that interrupts aren't sent to them.

Likewise the "or" API is useful to modify a cpumask indicating some
special nohz-type functionality so that the nohz cores are
automatically added to that set.

Signed-off-by: Chris Metcalf <[email protected]>
---
v4: update accessor names to make them clearer [PeterZ]

v3: no change here, just in part 2/2.

v2: put the "...set_cpus" API together with the change to
init_sched_domains() so that they can go into the timer tree
independently of other changes in my tree.

include/linux/tick.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9c085dc12ae9..ecc24e3a6fa2 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_cpumask_andnot(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_andnot(mask, mask, tick_nohz_full_mask);
+}
+
+static inline void tick_nohz_full_cpumask_or(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_cpumask_andnot(struct cpumask *mask) { }
+static inline void tick_nohz_full_cpumask_or(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.2

2015-04-09 17:00:58

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 2/2] nohz: set isolcpus when nohz_full is set

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Signed-off-by: Chris Metcalf <[email protected]>
Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
Acked-by: Rik van Riel <[email protected]>
---
v4: update accessor names to make them clearer [PeterZ]

v3: update changelog language to avoid using unclear "implies" [PeterZ]

kernel/sched/core.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..f8bce1b24cf3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6836,6 +6836,7 @@ static int init_sched_domains(const struct cpumask *cpu_map)
doms_cur = alloc_sched_domains(ndoms_cur);
if (!doms_cur)
doms_cur = &fallback_doms;
+ tick_nohz_full_cpumask_or(cpu_isolated_map);
cpumask_andnot(doms_cur[0], cpu_map, cpu_isolated_map);
err = build_sched_domains(doms_cur[0], NULL);
register_sched_domain_sysctl();
--
2.1.2

2015-04-09 17:07:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] nohz: add tick_nohz_full_cpumask_or() and _andnot() APIs

On Thu, Apr 09, 2015 at 01:00:38PM -0400, Chris Metcalf wrote:
> +static inline void tick_nohz_full_cpumask_or(struct cpumask *mask)

This still reads as if you're doing: nohz_full_mask |= mask.

I think the suggestion done by Frederic is the right one, reverse the
lot, have:

isolcpu_map_or(nohz_full_map) := isolcpus_map |= nohz_full_map

Or just completely give up and just write readable code under an #ifdef.

2015-04-09 17:13:07

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v5] nohz: set isolcpus when nohz_full is set

On Thu, Apr 09, 2015 at 12:59:39PM -0400, Chris Metcalf wrote:
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6d77432e14ff..18a961b9beba 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -323,6 +323,7 @@ struct task_struct;
> extern int lockdep_tasklist_lock_is_held(void);
> #endif /* #ifdef CONFIG_PROVE_RCU */
>
> +extern void sched_isolated_map_add(const struct cpumask *);
> extern void sched_init(void);
> extern void sched_init_smp(void);
> extern asmlinkage void schedule_tail(struct task_struct *prev);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..b055c5e0e65c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5824,6 +5824,11 @@ static int __init isolated_cpu_setup(char *str)
>
> __setup("isolcpus=", isolated_cpu_setup);
>
> +void sched_isolated_map_add(const struct cpumask *cpumask)
> +{
> + cpumask_or(cpu_isolated_map, cpu_isolated_map, cpumask);
> +}
> +
> struct s_data {
> struct sched_domain ** __percpu sd;
> struct root_domain *rd;
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index a4c4edac4528..b0092d02ca3f 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -385,6 +385,9 @@ void __init tick_nohz_init(void)
> for_each_cpu(cpu, tick_nohz_full_mask)
> context_tracking_cpu_set(cpu);
>
> + /* It's not meaningful to be nohz without disabling the scheduler. */
> + sched_isolated_map_add(tick_nohz_full_mask);
> +
> cpu_notifier(tick_nohz_cpu_down_callback, 0);
> pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
> cpumask_pr_args(tick_nohz_full_mask));

Right, this could work. Although I would suggest adding a comment
somewhere that we should be careful with init order. I checked, this
appears to be ordered right, but...

2015-04-09 17:24:36

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] nohz: add tick_nohz_full_cpumask_or() and _andnot() APIs

On 04/09/2015 01:07 PM, Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 01:00:38PM -0400, Chris Metcalf wrote:
>> +static inline void tick_nohz_full_cpumask_or(struct cpumask *mask)
> This still reads as if you're doing: nohz_full_mask |= mask.
>
> I think the suggestion done by Frederic is the right one, reverse the
> lot, have:
>
> isolcpu_map_or(nohz_full_map) := isolcpus_map |= nohz_full_map
>
> Or just completely give up and just write readable code under an #ifdef.

OK, so let's go with v5 (in the other thread) plus comments on init
ordering, then.
I'll repost that shortly.

However, I'd still appreciate guidance on the naming, since I do
have a patch outstanding to fiddle with cpumasks for nohz_full
(in the other case, for the tilegx network driver irq mask).

So here's the obvious readable code snippet approach:

#ifdef CONFIG_NO_HZ_FULL
cpumask_or(some_random_map, some_random_map, tick_nohz_full_map);
#endif

Some possible names so we can macroize them to no-ops:

exclude_nohz_full_cpus_from(some_random_map);
or
remove_nohz_full_cpus_from(some_random_map);

include_nohz_full_cpus_in(some_random_map);
or
add_nohz_full_cpus_to(some_random_map);

or perhaps with better namespace prefixes, but more confusing to read:

tick_nohz_full_exclude_cpus_from(some_random_map);
or
tick_nohz_full_remove_cpus_from(some_random_map);

tick_nohz_full_include_cpus_in(some_random_map);
or
tick_nohz_full_add_cpus_to(some_random_map);

Any of these sound good? Any other ideas?

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-09 17:42:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] nohz: add tick_nohz_full_cpumask_or() and _andnot() APIs

On Thu, Apr 09, 2015 at 01:24:22PM -0400, Chris Metcalf wrote:
> However, I'd still appreciate guidance on the naming, since I do
> have a patch outstanding to fiddle with cpumasks for nohz_full
> (in the other case, for the tilegx network driver irq mask).
>
> So here's the obvious readable code snippet approach:
>
> #ifdef CONFIG_NO_HZ_FULL
> cpumask_or(some_random_map, some_random_map, tick_nohz_full_map);
> #endif
>
> Some possible names so we can macroize them to no-ops:
>
> exclude_nohz_full_cpus_from(some_random_map);
> or
> remove_nohz_full_cpus_from(some_random_map);
>
> include_nohz_full_cpus_in(some_random_map);
> or
> add_nohz_full_cpus_to(some_random_map);
>
> or perhaps with better namespace prefixes, but more confusing to read:
>
> tick_nohz_full_exclude_cpus_from(some_random_map);
> or
> tick_nohz_full_remove_cpus_from(some_random_map);
>
> tick_nohz_full_include_cpus_in(some_random_map);
> or
> tick_nohz_full_add_cpus_to(some_random_map);
>
> Any of these sound good? Any other ideas?

Yes, I think these are all clearer than previous attempts.

I'm not entirely sure which to pick though; I have a vague preference
for add/remove over include/exclude and the top set reads better then
the lower set but the lower set is more consistent in naming :/

But the important point is that these names all indicate you're going to
change the mask passed as argument.

2015-04-09 18:02:02

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v6 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

The "removes_cpus_from" API is useful, for example, to modify a cpumask
to avoid the nohz cores so that interrupts aren't sent to them.

Likewise the "add_cpus_to" API is useful to modify a cpumask indicating
some special nohz-type functionality so that the nohz cores are
automatically added to that set.

Signed-off-by: Chris Metcalf <[email protected]>
---
v6: I think we may finally have accessor names that are OK

v5: (skipped this patch)

v4: update accessor names to make them clearer [PeterZ]

v3: no change here, just in part 2/2.

v2: put the "...set_cpus" API together with the change to
init_sched_domains() so that they can go into the timer tree
independently of other changes in my tree.

include/linux/tick.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9c085dc12ae9..8d1754c0f694 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
+static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_andnot(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
+static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.2

2015-04-09 18:02:13

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v6 2/2] nohz: set isolcpus when nohz_full is set

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Signed-off-by: Chris Metcalf <[email protected]>
---
I've come full circle on this issue. For v5 I liked Frederic's
idea of adding an API, but the more I thought about it seemed
like an overly-generic mechanism for what is really a very specific
issue, namely the interaction between the scheduler and nohz_full.
We already have plenty of nohz_full-specific code in sched/core.c,
and I think we now have a cpumask function name that is OK, so
now I'm proposing we just stick with directly modifying the
cpu_isolated_map using the new name. However, I've moved the
update to the map right to the top of sched_init_smp(), which seems
like a cleaner place to make this kind of a change.

v6: back to just using a direct update, with the new names

v5: test run of Frederic's proposal to use sched_isolated_map_add()

v4: update accessor names to make them clearer [PeterZ]

v3: update changelog language to avoid using unclear "implies" [PeterZ]

kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..041845c568d2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7031,6 +7031,9 @@ void __init sched_init_smp(void)
alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
alloc_cpumask_var(&fallback_doms, GFP_KERNEL);

+ /* nohz_full won't take effect without isolating the cpus. */
+ tick_nohz_full_remove_cpus_from(cpu_isolated_map);
+
sched_init_numa();

/*
--
2.1.2

2015-04-10 01:05:19

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH v5] nohz: set isolcpus when nohz_full is set

On Thu, 2015-04-09 at 19:12 +0200, Peter Zijlstra wrote:
> On Thu, Apr 09, 2015 at 12:59:39PM -0400, Chris Metcalf wrote:
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index 6d77432e14ff..18a961b9beba 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -323,6 +323,7 @@ struct task_struct;
> > extern int lockdep_tasklist_lock_is_held(void);
> > #endif /* #ifdef CONFIG_PROVE_RCU */
> >
> > +extern void sched_isolated_map_add(const struct cpumask *);
> > extern void sched_init(void);
> > extern void sched_init_smp(void);
> > extern asmlinkage void schedule_tail(struct task_struct *prev);
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index f0f831e8a345..b055c5e0e65c 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -5824,6 +5824,11 @@ static int __init isolated_cpu_setup(char
> > *str)
> >
> > __setup("isolcpus=", isolated_cpu_setup);
> >
> > +void sched_isolated_map_add(const struct cpumask *cpumask)
> > +{
> > + cpumask_or(cpu_isolated_map, cpu_isolated_map, cpumask);
> > +}
> > +
> > struct s_data {
> > struct sched_domain ** __percpu sd;
> > struct root_domain *rd;
> > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> > index a4c4edac4528..b0092d02ca3f 100644
> > --- a/kernel/time/tick-sched.c
> > +++ b/kernel/time/tick-sched.c
> > @@ -385,6 +385,9 @@ void __init tick_nohz_init(void)
> > for_each_cpu(cpu, tick_nohz_full_mask)
> > context_tracking_cpu_set(cpu);
> >
> > + /* It's not meaningful to be nohz without disabling the
> > scheduler. */
> > + sched_isolated_map_add(tick_nohz_full_mask);
> > +
> > cpu_notifier(tick_nohz_cpu_down_callback, 0);
> > pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
> > cpumask_pr_args(tick_nohz_full_mask));
>
> Right, this could work. Although I would suggest adding a comment
> somewhere that we should be careful with init order. I checked, this
> appears to be ordered right, but...

I'd embed it in domain construction, that way it'd be ready for the
day nohz_full becomes dynamic, and people can start using cpusets to
set up /tear down isolated sets on the fly.

-Mike

2015-04-10 01:34:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

On Thu, Apr 09, 2015 at 02:01:45PM -0400, Chris Metcalf wrote:
> The "removes_cpus_from" API is useful, for example, to modify a cpumask
> to avoid the nohz cores so that interrupts aren't sent to them.
>
> Likewise the "add_cpus_to" API is useful to modify a cpumask indicating
> some special nohz-type functionality so that the nohz cores are
> automatically added to that set.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> v6: I think we may finally have accessor names that are OK
>
> v5: (skipped this patch)
>
> v4: update accessor names to make them clearer [PeterZ]
>
> v3: no change here, just in part 2/2.
>
> v2: put the "...set_cpus" API together with the change to
> init_sched_domains() so that they can go into the timer tree
> independently of other changes in my tree.
>
> include/linux/tick.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 9c085dc12ae9..8d1754c0f694 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
> return cpumask_test_cpu(cpu, tick_nohz_full_mask);
> }
>
> +static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)

Or tick_nohz_full_affine() ?

> +{
> + if (tick_nohz_full_enabled())
> + cpumask_or(mask, mask, tick_nohz_full_mask);
> +}
> +
> +static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
> +{
> + if (tick_nohz_full_enabled())
> + cpumask_andnot(mask, mask, tick_nohz_full_mask);
> +}
> +
> extern void __tick_nohz_full_check(void);
> extern void tick_nohz_full_kick(void);
> extern void tick_nohz_full_kick_cpu(int cpu);
> @@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
> #else
> static inline bool tick_nohz_full_enabled(void) { return false; }
> static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> +static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
> +static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask) { }
> static inline void __tick_nohz_full_check(void) { }
> static inline void tick_nohz_full_kick_cpu(int cpu) { }
> static inline void tick_nohz_full_kick(void) { }
> --
> 2.1.2
>

2015-04-10 01:37:09

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v6 2/2] nohz: set isolcpus when nohz_full is set

On Thu, Apr 09, 2015 at 02:01:46PM -0400, Chris Metcalf wrote:
> nohz_full is only useful with isolcpus also set, since otherwise the
> scheduler has to run periodically to try to determine whether to steal
> work from other cores.
>
> Accordingly, when booting with nohz_full=xxx on the command line, we
> should act as if isolcpus=xxx was also set, and set (or extend) the
> isolcpus set to include the nohz_full cpus.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> I've come full circle on this issue. For v5 I liked Frederic's
> idea of adding an API, but the more I thought about it seemed
> like an overly-generic mechanism for what is really a very specific
> issue, namely the interaction between the scheduler and nohz_full.
> We already have plenty of nohz_full-specific code in sched/core.c,
> and I think we now have a cpumask function name that is OK, so
> now I'm proposing we just stick with directly modifying the
> cpu_isolated_map using the new name. However, I've moved the
> update to the map right to the top of sched_init_smp(), which seems
> like a cleaner place to make this kind of a change.
>
> v6: back to just using a direct update, with the new names
>
> v5: test run of Frederic's proposal to use sched_isolated_map_add()
>
> v4: update accessor names to make them clearer [PeterZ]
>
> v3: update changelog language to avoid using unclear "implies" [PeterZ]
>
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..041845c568d2 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7031,6 +7031,9 @@ void __init sched_init_smp(void)
> alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
> alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
>
> + /* nohz_full won't take effect without isolating the cpus. */
> + tick_nohz_full_remove_cpus_from(cpu_isolated_map);

This should be the other one I guess.

> +
> sched_init_numa();
>
> /*
> --
> 2.1.2
>

2015-04-10 15:31:28

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v6 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

On 04/09/2015 09:34 PM, Frederic Weisbecker wrote:
> On Thu, Apr 09, 2015 at 02:01:45PM -0400, Chris Metcalf wrote:
>> --- a/include/linux/tick.h
>> +++ b/include/linux/tick.h
>> @@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
>> return cpumask_test_cpu(cpu, tick_nohz_full_mask);
>> }
>>
>> +static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
> Or tick_nohz_full_affine() ?

I'd vote no to that, I think - the first problem I see is that it doesn't
make very clear that it modifies the argument, which is the problem
Peter Z has been having from the beginning with some suggestions.
The second problem is that it sounds like it might be setting the
argument unconditionally to the nohz_full set, when in fact it's doing
a cpumask_or() to increase the set of cpus further.

> + /* nohz_full won't take effect without isolating the cpus. */
> + tick_nohz_full_remove_cpus_from(cpu_isolated_map);
> This should be the other one I guess.
>

<Slaps self on forehead> Thanks!

Now off to do some actual testing before sending the next patchset :-)

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-10 15:33:32

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v5] nohz: set isolcpus when nohz_full is set

On 04/09/2015 09:05 PM, Mike Galbraith wrote:
> On Thu, 2015-04-09 at 19:12 +0200, Peter Zijlstra wrote:
>> On Thu, Apr 09, 2015 at 12:59:39PM -0400, Chris Metcalf wrote:
>>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>>> index 6d77432e14ff..18a961b9beba 100644
>>> --- a/include/linux/sched.h
>>> +++ b/include/linux/sched.h
>>> @@ -323,6 +323,7 @@ struct task_struct;
>>> extern int lockdep_tasklist_lock_is_held(void);
>>> #endif /* #ifdef CONFIG_PROVE_RCU */
>>>
>>> +extern void sched_isolated_map_add(const struct cpumask *);
>>> extern void sched_init(void);
>>> extern void sched_init_smp(void);
>>> extern asmlinkage void schedule_tail(struct task_struct *prev);
>>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>>> index f0f831e8a345..b055c5e0e65c 100644
>>> --- a/kernel/sched/core.c
>>> +++ b/kernel/sched/core.c
>>> @@ -5824,6 +5824,11 @@ static int __init isolated_cpu_setup(char
>>> *str)
>>>
>>> __setup("isolcpus=", isolated_cpu_setup);
>>>
>>> +void sched_isolated_map_add(const struct cpumask *cpumask)
>>> +{
>>> + cpumask_or(cpu_isolated_map, cpu_isolated_map, cpumask);
>>> +}
>>> +
>>> struct s_data {
>>> struct sched_domain ** __percpu sd;
>>> struct root_domain *rd;
>>> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
>>> index a4c4edac4528..b0092d02ca3f 100644
>>> --- a/kernel/time/tick-sched.c
>>> +++ b/kernel/time/tick-sched.c
>>> @@ -385,6 +385,9 @@ void __init tick_nohz_init(void)
>>> for_each_cpu(cpu, tick_nohz_full_mask)
>>> context_tracking_cpu_set(cpu);
>>>
>>> + /* It's not meaningful to be nohz without disabling the
>>> scheduler. */
>>> + sched_isolated_map_add(tick_nohz_full_mask);
>>> +
>>> cpu_notifier(tick_nohz_cpu_down_callback, 0);
>>> pr_info("NO_HZ: Full dynticks CPUs: %*pbl.\n",
>>> cpumask_pr_args(tick_nohz_full_mask));
>> Right, this could work. Although I would suggest adding a comment
>> somewhere that we should be careful with init order. I checked, this
>> appears to be ordered right, but...
> I'd embed it in domain construction, that way it'd be ready for the
> day nohz_full becomes dynamic, and people can start using cpusets to
> set up /tear down isolated sets on the fly.

So, move it to the top of build_sched_domains(), and then for
every "const struct cpumask *cpu_map" argument, create a
temporary cpu_map so we can mask out the nohz_full cores?

The problem is, we already allow partition_sched_domains()
to override "isolcpus=", so it seems appropriate that you should
be able to override "nohz_full=" in the same way, which my
current patch (v6) does.

So I think the proposed solution is certainly no worse than what
we have now in terms of a future migration to cpusets.

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-10 15:57:56

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH v5] nohz: set isolcpus when nohz_full is set

On Fri, 2015-04-10 at 11:33 -0400, Chris Metcalf wrote:
>
> So I think the proposed solution is certainly no worse than what
> we have now in terms of a future migration to cpusets.

Yeah, whatever will fly in the here and now is fine.

-Mike

2015-04-10 20:54:05

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v7 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

The "removes_cpus_from" API is useful, for example, to modify a cpumask
to avoid the nohz cores so that interrupts aren't sent to them.

Likewise the "add_cpus_to" API is useful to modify a cpumask indicating
some special nohz-type functionality so that the nohz cores are
automatically added to that set.

Signed-off-by: Chris Metcalf <[email protected]>
---
v7: no change here, just in part 2/2.

v6: I think we may finally have accessor names that are OK

v5: (skipped this patch)

v4: update accessor names to make them clearer [PeterZ]

v3: no change here, just in part 2/2.

v2: put the "...set_cpus" API together with the change to
init_sched_domains() so that they can go into the timer tree
independently of other changes in my tree.

include/linux/tick.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

include/linux/tick.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 9c085dc12ae9..8d1754c0f694 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
+static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_andnot(mask, mask, tick_nohz_full_mask);
+}
+
extern void __tick_nohz_full_check(void);
extern void tick_nohz_full_kick(void);
extern void tick_nohz_full_kick_cpu(int cpu);
@@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
+static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
static inline void tick_nohz_full_kick(void) { }
--
2.1.2

2015-04-10 20:54:13

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v7 2/2] nohz: set isolcpus when nohz_full is set

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Signed-off-by: Chris Metcalf <[email protected]>
---
v7: oops, use add_cpus_to, not remove_cpus_from [Frederic]

v6: back to just using a direct update, with the new names

v5: test run of Frederic's proposal to use sched_isolated_map_add()

v4: update accessor names to make them clearer [PeterZ]

v3: update changelog language to avoid using unclear "implies" [PeterZ]

kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..4a5681b5a4fd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7031,6 +7031,9 @@ void __init sched_init_smp(void)
alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
alloc_cpumask_var(&fallback_doms, GFP_KERNEL);

+ /* nohz_full won't take effect without isolating the cpus. */
+ tick_nohz_full_add_cpus_to(cpu_isolated_map);
+
sched_init_numa();

/*
--
2.1.2

2015-04-14 00:33:58

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v7 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

On Fri, Apr 10, 2015 at 04:53:51PM -0400, Chris Metcalf wrote:
> The "removes_cpus_from" API is useful, for example, to modify a cpumask
> to avoid the nohz cores so that interrupts aren't sent to them.
>
> Likewise the "add_cpus_to" API is useful to modify a cpumask indicating
> some special nohz-type functionality so that the nohz cores are
> automatically added to that set.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> v7: no change here, just in part 2/2.
>
> v6: I think we may finally have accessor names that are OK
>
> v5: (skipped this patch)
>
> v4: update accessor names to make them clearer [PeterZ]
>
> v3: no change here, just in part 2/2.
>
> v2: put the "...set_cpus" API together with the change to
> init_sched_domains() so that they can go into the timer tree
> independently of other changes in my tree.
>
> include/linux/tick.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> include/linux/tick.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 9c085dc12ae9..8d1754c0f694 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
> return cpumask_test_cpu(cpu, tick_nohz_full_mask);
> }
>
> +static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
> +{
> + if (tick_nohz_full_enabled())
> + cpumask_or(mask, mask, tick_nohz_full_mask);
> +}
> +
> +static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
> +{
> + if (tick_nohz_full_enabled())
> + cpumask_andnot(mask, mask, tick_nohz_full_mask);

I would prefer that you don't introduce new APIs if they aren't used in your
patchset. It seems that's the case for tick_nohz_full_remove_cpus_from().

Also we have housekeeping_affine() that affines a task to CPUs outside the
range of nohz full. In case you still need tick_nohz_full_remove_cpus_from()
in a further patchset, housekeeping_affine_cpumask() would extend the existing
naming.

> +}
> +
> extern void __tick_nohz_full_check(void);
> extern void tick_nohz_full_kick(void);
> extern void tick_nohz_full_kick_cpu(int cpu);
> @@ -194,6 +206,8 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
> #else
> static inline bool tick_nohz_full_enabled(void) { return false; }
> static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> +static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
> +static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask) { }
> static inline void __tick_nohz_full_check(void) { }
> static inline void tick_nohz_full_kick_cpu(int cpu) { }
> static inline void tick_nohz_full_kick(void) { }
> --
> 2.1.2
>

2015-04-14 00:37:19

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v7 2/2] nohz: set isolcpus when nohz_full is set

On Fri, Apr 10, 2015 at 04:53:52PM -0400, Chris Metcalf wrote:
> nohz_full is only useful with isolcpus also set, since otherwise the
> scheduler has to run periodically to try to determine whether to steal
> work from other cores.
>
> Accordingly, when booting with nohz_full=xxx on the command line, we
> should act as if isolcpus=xxx was also set, and set (or extend) the
> isolcpus set to include the nohz_full cpus.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> v7: oops, use add_cpus_to, not remove_cpus_from [Frederic]
>
> v6: back to just using a direct update, with the new names
>
> v5: test run of Frederic's proposal to use sched_isolated_map_add()
>
> v4: update accessor names to make them clearer [PeterZ]
>
> v3: update changelog language to avoid using unclear "implies" [PeterZ]
>
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..4a5681b5a4fd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7031,6 +7031,9 @@ void __init sched_init_smp(void)
> alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
> alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
>
> + /* nohz_full won't take effect without isolating the cpus. */
> + tick_nohz_full_add_cpus_to(cpu_isolated_map);
> +

I can't say I like this, but I can not come up with a better name.
So that patch is ok to me (if Peterz acks it) :-)

> sched_init_numa();
>
> /*
> --
> 2.1.2
>

2015-04-14 00:49:46

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v7 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

On 4/13/2015 8:33 PM, Frederic Weisbecker wrote:
> On Fri, Apr 10, 2015 at 04:53:51PM -0400, Chris Metcalf wrote:
>> The "removes_cpus_from" API is useful, for example, to modify a cpumask
>> to avoid the nohz cores so that interrupts aren't sent to them.
>>
>> Likewise the "add_cpus_to" API is useful to modify a cpumask indicating
>> some special nohz-type functionality so that the nohz cores are
>> automatically added to that set.
>>
>> Signed-off-by: Chris Metcalf <[email protected]>
>> ---
>> [...]
>>
>> diff --git a/include/linux/tick.h b/include/linux/tick.h
>> index 9c085dc12ae9..8d1754c0f694 100644
>> --- a/include/linux/tick.h
>> +++ b/include/linux/tick.h
>> @@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
>> return cpumask_test_cpu(cpu, tick_nohz_full_mask);
>> }
>>
>> +static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
>> +{
>> + if (tick_nohz_full_enabled())
>> + cpumask_or(mask, mask, tick_nohz_full_mask);
>> +}
>> +
>> +static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
>> +{
>> + if (tick_nohz_full_enabled())
>> + cpumask_andnot(mask, mask, tick_nohz_full_mask);
> I would prefer that you don't introduce new APIs if they aren't used in your
> patchset. It seems that's the case for tick_nohz_full_remove_cpus_from().

Yes, it's used in a tile-tree patch to remove nohz_full cpus from the set that
take interrupts from the tilegx network driver.

I can certainly make this patchset have just the _add_cpus_to() variant,
and the other patchset have just the _remove_cpus_from() variant.
It seemed to make sense to include them together as a matched set,
but doing it the way you suggest makes an equal if different amount of sense.

> Also we have housekeeping_affine() that affines a task to CPUs outside the
> range of nohz full. In case you still need tick_nohz_full_remove_cpus_from()
> in a further patchset, housekeeping_affine_cpumask() would extend the existing
> naming.

I like housekeeping_affine(), but it overwrites the affinity mask of
the task. So I would expect housekeeping_affine_mask() to overwrite
the specified argument cpumask, which it doesn't in my definition.

I don't know that I can think of a good name in the "housekeeping_xxx"
namespace that feels better than the one I proposed. In context of the
proposed client of the API so far, it's:

if (!network_cpus_init()) {
network_cpus_map = *cpu_online_mask;
tick_nohz_full_remove_cpus_from(&network_cpus_map);
}

If housekeeping_mask were defined for non-nohz_full I could just use
it unconditionally here. Alternately I could just put in an #ifdef for
CONFIG_NO_HZ_FULL and either use cpu_only_mask or housekeeping_mask
to initialize network_cpus_map, which dodges the bullet of creating
an acceptable API name here for the moment.

Frederic, what's your thought/preference?

--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com

2015-04-14 15:18:11

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v8 1/2] nohz: add tick_nohz_full_add_cpus_to() API

This API is useful to modify a cpumask indicating some special nohz-type
functionality so that the nohz cores are automatically added to that set.

Acked-by: Frederic Weisbecker <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
---
v8: Just include _add_cpus_to() in this patch set [Frederic]
Add Frederic's Ack

v7: no change here, just in part 2/2.

v6: I think we may finally have accessor names that are OK

v5: (skipped this patch)

v4: update accessor names to make them clearer [PeterZ]

v3: no change here, just in part 2/2.

v2: put the "...set_cpus" API together with the change to
init_sched_domains() so that they can go into the timer tree
independently of other changes in my tree.

include/linux/tick.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/include/linux/tick.h b/include/linux/tick.h
index ce655084ffe3..8d1754c0f694 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -186,6 +186,12 @@ static inline bool tick_nohz_full_cpu(int cpu)
return cpumask_test_cpu(cpu, tick_nohz_full_mask);
}

+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
+{
+ if (tick_nohz_full_enabled())
+ cpumask_or(mask, mask, tick_nohz_full_mask);
+}
+
static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
{
if (tick_nohz_full_enabled())
@@ -200,6 +206,7 @@ extern void __tick_nohz_task_switch(struct task_struct *tsk);
#else
static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask) { }
static inline void __tick_nohz_full_check(void) { }
static inline void tick_nohz_full_kick_cpu(int cpu) { }
--
2.1.2

2015-04-14 15:18:22

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v8 2/2] nohz: set isolcpus when nohz_full is set

nohz_full is only useful with isolcpus also set, since otherwise the
scheduler has to run periodically to try to determine whether to steal
work from other cores.

Accordingly, when booting with nohz_full=xxx on the command line, we
should act as if isolcpus=xxx was also set, and set (or extend) the
isolcpus set to include the nohz_full cpus.

Acked-by: Frederic Weisbecker <[email protected]>
Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
Acked-by: Rik van Riel <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
---
v8: no change in this patch, just in 1/2
Add Frederic's Ack

v7: oops, use add_cpus_to, not remove_cpus_from [Frederic]

v6: back to just using a direct update, with the new names

v5: test run of Frederic's proposal to use sched_isolated_map_add()

v4: update accessor names to make them clearer [PeterZ]

v3: update changelog language to avoid using unclear "implies" [PeterZ]

v2: add Mike and Rik's Acked-by

kernel/sched/core.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f0f831e8a345..4a5681b5a4fd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7031,6 +7031,9 @@ void __init sched_init_smp(void)
alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
alloc_cpumask_var(&fallback_doms, GFP_KERNEL);

+ /* nohz_full won't take effect without isolating the cpus. */
+ tick_nohz_full_add_cpus_to(cpu_isolated_map);
+
sched_init_numa();

/*
--
2.1.2

2015-04-14 15:26:56

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v8 2/2] nohz: set isolcpus when nohz_full is set

On Tue, Apr 14, 2015 at 11:17:55AM -0400, Chris Metcalf wrote:
> nohz_full is only useful with isolcpus also set, since otherwise the
> scheduler has to run periodically to try to determine whether to steal
> work from other cores.
>
> Accordingly, when booting with nohz_full=xxx on the command line, we
> should act as if isolcpus=xxx was also set, and set (or extend) the
> isolcpus set to include the nohz_full cpus.
>
> Acked-by: Frederic Weisbecker <[email protected]>

No need to put my ack, since I'll add my Signed-off-by when I apply the patches.
Note this won't go through this merge window though as we are in the middle of
it already.

Peter, are you ok with these two patches?

Thanks.

> Acked-by: Mike Galbraith <[email protected]> ["thumbs up!"]
> Acked-by: Rik van Riel <[email protected]>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> v8: no change in this patch, just in 1/2
> Add Frederic's Ack
>
> v7: oops, use add_cpus_to, not remove_cpus_from [Frederic]
>
> v6: back to just using a direct update, with the new names
>
> v5: test run of Frederic's proposal to use sched_isolated_map_add()
>
> v4: update accessor names to make them clearer [PeterZ]
>
> v3: update changelog language to avoid using unclear "implies" [PeterZ]
>
> v2: add Mike and Rik's Acked-by
>
> kernel/sched/core.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index f0f831e8a345..4a5681b5a4fd 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7031,6 +7031,9 @@ void __init sched_init_smp(void)
> alloc_cpumask_var(&non_isolated_cpus, GFP_KERNEL);
> alloc_cpumask_var(&fallback_doms, GFP_KERNEL);
>
> + /* nohz_full won't take effect without isolating the cpus. */
> + tick_nohz_full_add_cpus_to(cpu_isolated_map);
> +
> sched_init_numa();
>
> /*
> --
> 2.1.2
>

2015-04-14 15:35:10

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [PATCH v7 1/2] nohz: add tick_nohz_full_add_cpus_to() and _remove_cpus_from() APIs

On Mon, Apr 13, 2015 at 08:49:20PM -0400, Chris Metcalf wrote:
> On 4/13/2015 8:33 PM, Frederic Weisbecker wrote:
> >On Fri, Apr 10, 2015 at 04:53:51PM -0400, Chris Metcalf wrote:
> >>The "removes_cpus_from" API is useful, for example, to modify a cpumask
> >>to avoid the nohz cores so that interrupts aren't sent to them.
> >>
> >>Likewise the "add_cpus_to" API is useful to modify a cpumask indicating
> >>some special nohz-type functionality so that the nohz cores are
> >>automatically added to that set.
> >>
> >>Signed-off-by: Chris Metcalf <[email protected]>
> >>---
> >>[...]
> >>
> >>diff --git a/include/linux/tick.h b/include/linux/tick.h
> >>index 9c085dc12ae9..8d1754c0f694 100644
> >>--- a/include/linux/tick.h
> >>+++ b/include/linux/tick.h
> >>@@ -186,6 +186,18 @@ static inline bool tick_nohz_full_cpu(int cpu)
> >> return cpumask_test_cpu(cpu, tick_nohz_full_mask);
> >> }
> >>+static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask)
> >>+{
> >>+ if (tick_nohz_full_enabled())
> >>+ cpumask_or(mask, mask, tick_nohz_full_mask);
> >>+}
> >>+
> >>+static inline void tick_nohz_full_remove_cpus_from(struct cpumask *mask)
> >>+{
> >>+ if (tick_nohz_full_enabled())
> >>+ cpumask_andnot(mask, mask, tick_nohz_full_mask);
> >I would prefer that you don't introduce new APIs if they aren't used in your
> >patchset. It seems that's the case for tick_nohz_full_remove_cpus_from().
>
> Yes, it's used in a tile-tree patch to remove nohz_full cpus from the set that
> take interrupts from the tilegx network driver.
>
> I can certainly make this patchset have just the _add_cpus_to() variant,
> and the other patchset have just the _remove_cpus_from() variant.
> It seemed to make sense to include them together as a matched set,
> but doing it the way you suggest makes an equal if different amount of sense.
>
> >Also we have housekeeping_affine() that affines a task to CPUs outside the
> >range of nohz full. In case you still need tick_nohz_full_remove_cpus_from()
> >in a further patchset, housekeeping_affine_cpumask() would extend the existing
> >naming.
>
> I like housekeeping_affine(), but it overwrites the affinity mask of
> the task. So I would expect housekeeping_affine_mask() to overwrite
> the specified argument cpumask, which it doesn't in my definition.
>
> I don't know that I can think of a good name in the "housekeeping_xxx"
> namespace that feels better than the one I proposed. In context of the
> proposed client of the API so far, it's:
>
> if (!network_cpus_init()) {
> network_cpus_map = *cpu_online_mask;
> tick_nohz_full_remove_cpus_from(&network_cpus_map);
> }
>
> If housekeeping_mask were defined for non-nohz_full I could just use
> it unconditionally here. Alternately I could just put in an #ifdef for
> CONFIG_NO_HZ_FULL and either use cpu_only_mask or housekeeping_mask
> to initialize network_cpus_map, which dodges the bullet of creating
> an acceptable API name here for the moment.
>
> Frederic, what's your thought/preference?

Right, the more I look into this whole, the more I think we should perhaps
make tick_nohz_full_cpumask a read-only wide visible cpumask like we do
for the cpumask in kernel/cpu.c defined as "const struct cpumask *const tick_nohz_full_mask"
and do the cpumask operations with it.

>
> --
> Chris Metcalf, EZChip Semiconductor
> http://www.ezchip.com
>

2015-04-14 16:46:01

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v8 2/2] nohz: set isolcpus when nohz_full is set

On Tue, Apr 14, 2015 at 05:26:45PM +0200, Frederic Weisbecker wrote:
> On Tue, Apr 14, 2015 at 11:17:55AM -0400, Chris Metcalf wrote:
> > nohz_full is only useful with isolcpus also set, since otherwise the
> > scheduler has to run periodically to try to determine whether to steal
> > work from other cores.
> >
> > Accordingly, when booting with nohz_full=xxx on the command line, we
> > should act as if isolcpus=xxx was also set, and set (or extend) the
> > isolcpus set to include the nohz_full cpus.
> >
> > Acked-by: Frederic Weisbecker <[email protected]>
>
> No need to put my ack, since I'll add my Signed-off-by when I apply the patches.
> Note this won't go through this merge window though as we are in the middle of
> it already.
>
> Peter, are you ok with these two patches?

Yeah,

Acked-by: Peter Zijlstra (Intel) <[email protected]>