2019-07-24 14:09:56

by Paul E. McKenney

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Wed, Jul 24, 2019 at 03:22:59PM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 24, 2019 at 04:53:31AM -0700, Paul E. McKenney wrote:
> > Hello!
> >
> > One of the callback-invocation forward-progress issues turns out to
> > be nohz_full CPUs not turning their scheduling-clock interrupt back on
> > when running in kernel mode. Given that callback floods can cause RCU's
> > callback-invocation loop to run for some time, it would be good for this
> > loop to re-enable this interrupt. Of course, this problem applies to
> > pretty much any kernel code that might loop for an extended time period,
> > not just RCU.
> >
> > I took a quick look at kernel/time/tick-sched.c and the closest thing
> > I found was tick_nohz_full_kick_cpu(), except that (1) it isn't clear
> > that this does much when invoked on the current CPU and (2) it doesn't
> > help in rcutorture TREE04. In contrast, disabling NO_HZ_FULL and using
> > RCU_NOCB_CPU instead works quite well.
> >
> > So what should I be calling instead of tick_nohz_full_kick_cpu() to
> > re-enable the current CPU's scheduling-clock interrupt?
>
> Indeed, kernel code is assumed to be quick enough (between two extended grace
> periods) to avoid running the tick for RCU. But some long lasting kernel code
> may require to tick temporarily.
>
> You can use tick_nohz_dep_set_cpu(cpu, TICK_DEP_MASK_RCU) with the
> following:
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index f92a10b5e112..3f476e2a4bf7 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -108,7 +108,8 @@ enum tick_dep_bits {
> TICK_DEP_BIT_POSIX_TIMER = 0,
> TICK_DEP_BIT_PERF_EVENTS = 1,
> TICK_DEP_BIT_SCHED = 2,
> - TICK_DEP_BIT_CLOCK_UNSTABLE = 3
> + TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
> + TICK_DEP_BIT_RCU = 4
> };
>
> #define TICK_DEP_MASK_NONE 0
> @@ -116,6 +117,7 @@ enum tick_dep_bits {
> #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
> #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
> #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
> +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
>
> #ifdef CONFIG_NO_HZ_COMMON
> extern bool tick_nohz_enabled;

I will give this a try, thank you! (Testing will take a few days.)

Thanx, Paul


2019-07-24 16:30:08

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Wed, Jul 24, 2019 at 06:52:19AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 24, 2019 at 03:22:59PM +0200, Frederic Weisbecker wrote:
> > On Wed, Jul 24, 2019 at 04:53:31AM -0700, Paul E. McKenney wrote:
> > > Hello!
> > >
> > > One of the callback-invocation forward-progress issues turns out to
> > > be nohz_full CPUs not turning their scheduling-clock interrupt back on
> > > when running in kernel mode. Given that callback floods can cause RCU's
> > > callback-invocation loop to run for some time, it would be good for this
> > > loop to re-enable this interrupt. Of course, this problem applies to
> > > pretty much any kernel code that might loop for an extended time period,
> > > not just RCU.
> > >
> > > I took a quick look at kernel/time/tick-sched.c and the closest thing
> > > I found was tick_nohz_full_kick_cpu(), except that (1) it isn't clear
> > > that this does much when invoked on the current CPU and (2) it doesn't
> > > help in rcutorture TREE04. In contrast, disabling NO_HZ_FULL and using
> > > RCU_NOCB_CPU instead works quite well.
> > >
> > > So what should I be calling instead of tick_nohz_full_kick_cpu() to
> > > re-enable the current CPU's scheduling-clock interrupt?
> >
> > Indeed, kernel code is assumed to be quick enough (between two extended grace
> > periods) to avoid running the tick for RCU. But some long lasting kernel code
> > may require to tick temporarily.
> >
> > You can use tick_nohz_dep_set_cpu(cpu, TICK_DEP_MASK_RCU) with the
> > following:
> >
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index f92a10b5e112..3f476e2a4bf7 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -108,7 +108,8 @@ enum tick_dep_bits {
> > TICK_DEP_BIT_POSIX_TIMER = 0,
> > TICK_DEP_BIT_PERF_EVENTS = 1,
> > TICK_DEP_BIT_SCHED = 2,
> > - TICK_DEP_BIT_CLOCK_UNSTABLE = 3
> > + TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
> > + TICK_DEP_BIT_RCU = 4
> > };
> >
> > #define TICK_DEP_MASK_NONE 0
> > @@ -116,6 +117,7 @@ enum tick_dep_bits {
> > #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
> > #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
> > #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
> > +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
> >
> > #ifdef CONFIG_NO_HZ_COMMON
> > extern bool tick_nohz_enabled;
>
> I will give this a try, thank you! (Testing will take a few days.)

Sure!

For the background: expect a self-IPI to fire which then restart the tick on IRQ exit.
Once you're later done with the work, don't forget to remove the tick dependency:

tick_nohz_dep_clear_cpu(cpu, TICK_DEP_MASK_RCU);


Thanks.

2019-07-25 05:51:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Wed, Jul 24, 2019 at 04:30:13PM +0200, Frederic Weisbecker wrote:
> On Wed, Jul 24, 2019 at 06:52:19AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 24, 2019 at 03:22:59PM +0200, Frederic Weisbecker wrote:
> > > On Wed, Jul 24, 2019 at 04:53:31AM -0700, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > One of the callback-invocation forward-progress issues turns out to
> > > > be nohz_full CPUs not turning their scheduling-clock interrupt back on
> > > > when running in kernel mode. Given that callback floods can cause RCU's
> > > > callback-invocation loop to run for some time, it would be good for this
> > > > loop to re-enable this interrupt. Of course, this problem applies to
> > > > pretty much any kernel code that might loop for an extended time period,
> > > > not just RCU.
> > > >
> > > > I took a quick look at kernel/time/tick-sched.c and the closest thing
> > > > I found was tick_nohz_full_kick_cpu(), except that (1) it isn't clear
> > > > that this does much when invoked on the current CPU and (2) it doesn't
> > > > help in rcutorture TREE04. In contrast, disabling NO_HZ_FULL and using
> > > > RCU_NOCB_CPU instead works quite well.
> > > >
> > > > So what should I be calling instead of tick_nohz_full_kick_cpu() to
> > > > re-enable the current CPU's scheduling-clock interrupt?
> > >
> > > Indeed, kernel code is assumed to be quick enough (between two extended grace
> > > periods) to avoid running the tick for RCU. But some long lasting kernel code
> > > may require to tick temporarily.
> > >
> > > You can use tick_nohz_dep_set_cpu(cpu, TICK_DEP_MASK_RCU) with the
> > > following:
> > >
> > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > index f92a10b5e112..3f476e2a4bf7 100644
> > > --- a/include/linux/tick.h
> > > +++ b/include/linux/tick.h
> > > @@ -108,7 +108,8 @@ enum tick_dep_bits {
> > > TICK_DEP_BIT_POSIX_TIMER = 0,
> > > TICK_DEP_BIT_PERF_EVENTS = 1,
> > > TICK_DEP_BIT_SCHED = 2,
> > > - TICK_DEP_BIT_CLOCK_UNSTABLE = 3
> > > + TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
> > > + TICK_DEP_BIT_RCU = 4
> > > };
> > >
> > > #define TICK_DEP_MASK_NONE 0
> > > @@ -116,6 +117,7 @@ enum tick_dep_bits {
> > > #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
> > > #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
> > > #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
> > > +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
> > >
> > > #ifdef CONFIG_NO_HZ_COMMON
> > > extern bool tick_nohz_enabled;
> >
> > I will give this a try, thank you! (Testing will take a few days.)
>
> Sure!
>
> For the background: expect a self-IPI to fire which then restart the tick on IRQ exit.
> Once you're later done with the work, don't forget to remove the tick dependency:
>
> tick_nohz_dep_clear_cpu(cpu, TICK_DEP_MASK_RCU);

Does this get cleared automatically if preempted, or should I clear
this is one of RCU's scheduler hooks? (My guess is the latter, but
I figured I should ask.)

Thanx, Paul

2019-07-30 03:17:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Wed, Jul 24, 2019 at 06:12:43PM -0700, Paul E. McKenney wrote:
> On Wed, Jul 24, 2019 at 04:30:13PM +0200, Frederic Weisbecker wrote:
> > On Wed, Jul 24, 2019 at 06:52:19AM -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 24, 2019 at 03:22:59PM +0200, Frederic Weisbecker wrote:
> > > > On Wed, Jul 24, 2019 at 04:53:31AM -0700, Paul E. McKenney wrote:
> > > > > Hello!
> > > > >
> > > > > One of the callback-invocation forward-progress issues turns out to
> > > > > be nohz_full CPUs not turning their scheduling-clock interrupt back on
> > > > > when running in kernel mode. Given that callback floods can cause RCU's
> > > > > callback-invocation loop to run for some time, it would be good for this
> > > > > loop to re-enable this interrupt. Of course, this problem applies to
> > > > > pretty much any kernel code that might loop for an extended time period,
> > > > > not just RCU.
> > > > >
> > > > > I took a quick look at kernel/time/tick-sched.c and the closest thing
> > > > > I found was tick_nohz_full_kick_cpu(), except that (1) it isn't clear
> > > > > that this does much when invoked on the current CPU and (2) it doesn't
> > > > > help in rcutorture TREE04. In contrast, disabling NO_HZ_FULL and using
> > > > > RCU_NOCB_CPU instead works quite well.
> > > > >
> > > > > So what should I be calling instead of tick_nohz_full_kick_cpu() to
> > > > > re-enable the current CPU's scheduling-clock interrupt?
> > > >
> > > > Indeed, kernel code is assumed to be quick enough (between two extended grace
> > > > periods) to avoid running the tick for RCU. But some long lasting kernel code
> > > > may require to tick temporarily.
> > > >
> > > > You can use tick_nohz_dep_set_cpu(cpu, TICK_DEP_MASK_RCU) with the
> > > > following:
> > > >
> > > > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > > > index f92a10b5e112..3f476e2a4bf7 100644
> > > > --- a/include/linux/tick.h
> > > > +++ b/include/linux/tick.h
> > > > @@ -108,7 +108,8 @@ enum tick_dep_bits {
> > > > TICK_DEP_BIT_POSIX_TIMER = 0,
> > > > TICK_DEP_BIT_PERF_EVENTS = 1,
> > > > TICK_DEP_BIT_SCHED = 2,
> > > > - TICK_DEP_BIT_CLOCK_UNSTABLE = 3
> > > > + TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
> > > > + TICK_DEP_BIT_RCU = 4
> > > > };
> > > >
> > > > #define TICK_DEP_MASK_NONE 0
> > > > @@ -116,6 +117,7 @@ enum tick_dep_bits {
> > > > #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
> > > > #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
> > > > #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
> > > > +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
> > > >
> > > > #ifdef CONFIG_NO_HZ_COMMON
> > > > extern bool tick_nohz_enabled;
> > >
> > > I will give this a try, thank you! (Testing will take a few days.)
> >
> > Sure!
> >
> > For the background: expect a self-IPI to fire which then restart the tick on IRQ exit.
> > Once you're later done with the work, don't forget to remove the tick dependency:
> >
> > tick_nohz_dep_clear_cpu(cpu, TICK_DEP_MASK_RCU);
>
> Does this get cleared automatically if preempted, or should I clear
> this is one of RCU's scheduler hooks? (My guess is the latter, but
> I figured I should ask.)

The patch below (which includes your patch) does help considerably.
However, it does have some shortcomings:

1. Adds an atomic operation (albeit a cache-local one) to
the scheduler fastpath. One approach would be to have
a way of testing this bit and clearing it only if set.

Another approach would be to instead clear it on the
transition to nohz_full userspace or to idle.

2. There are a lot of other places in the kernel that are in
need of this bit being set. I am therefore considering making
multi_cpu_stop() or its callers set this bit on all CPUs upon
entry and clear it upon exit. While in this state, it is
likely necessary to disable clearing this bit. Or it would
be necessary to make multi_cpu_stop() repeat clearing the bit
every so often.

As it stands, I have CPU hotplug removal operations taking
more than 400 seconds.

3. It was tempting to ask for this bit to be tracked on a per-task
basis, but from what I can see that adds at least as much
complexity as it removes.

Thoughts?

Thanx, Paul

PS. Outage on @linux.ibm.com, hence the CC of my gmail address.

------------------------------------------------------------------------

diff --git a/include/linux/tick.h b/include/linux/tick.h
index 196a0a7bfc4f..0dea6fb33a11 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -108,7 +108,8 @@ enum tick_dep_bits {
TICK_DEP_BIT_POSIX_TIMER = 0,
TICK_DEP_BIT_PERF_EVENTS = 1,
TICK_DEP_BIT_SCHED = 2,
- TICK_DEP_BIT_CLOCK_UNSTABLE = 3
+ TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
+ TICK_DEP_BIT_RCU = 4
};

#define TICK_DEP_MASK_NONE 0
@@ -116,6 +117,7 @@ enum tick_dep_bits {
#define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
#define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
#define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
+#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)

#ifdef CONFIG_NO_HZ_COMMON
extern bool tick_nohz_enabled;
@@ -258,6 +260,9 @@ static inline bool tick_nohz_full_enabled(void) { return false; }
static inline bool tick_nohz_full_cpu(int cpu) { return false; }
static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }

+static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
+static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { }
+
static inline void tick_dep_set(enum tick_dep_bits bit) { }
static inline void tick_dep_clear(enum tick_dep_bits bit) { }
static inline void tick_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
diff --git a/kernel/rcu/rcutorture.c b/kernel/rcu/rcutorture.c
index 3c9feca1eab1..e6a74b9fb0b8 100644
--- a/kernel/rcu/rcutorture.c
+++ b/kernel/rcu/rcutorture.c
@@ -44,6 +44,7 @@
#include <linux/sched/debug.h>
#include <linux/sched/sysctl.h>
#include <linux/oom.h>
+#include <linux/tick.h>

#include "rcu.h"

@@ -1353,6 +1354,7 @@ static void rcu_torture_timer(struct timer_list *unused)
static int
rcu_torture_reader(void *arg)
{
+ unsigned long i = 0;
unsigned long lastsleep = jiffies;
long myid = (long)arg;
int mynumonline = myid;
@@ -1363,26 +1365,31 @@ rcu_torture_reader(void *arg)
set_user_nice(current, MAX_NICE);
if (irqreader && cur_ops->irq_capable)
timer_setup_on_stack(&t, rcu_torture_timer, 0);
-
do {
if (irqreader && cur_ops->irq_capable) {
if (!timer_pending(&t))
mod_timer(&t, jiffies + 1);
}
- if (!rcu_torture_one_read(&rand))
+ if (!rcu_torture_one_read(&rand) && !torture_must_stop())
schedule_timeout_interruptible(HZ);
- if (time_after(jiffies, lastsleep)) {
+ if (time_after(jiffies, lastsleep) && !torture_must_stop()) {
schedule_timeout_interruptible(1);
lastsleep = jiffies + 10;
}
while (num_online_cpus() < mynumonline && !torture_must_stop())
schedule_timeout_interruptible(HZ / 5);
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL) && i++ & 0x1f)
+ tick_nohz_dep_set_cpu(raw_smp_processor_id(),
+ TICK_DEP_MASK_RCU);
stutter_wait("rcu_torture_reader");
} while (!torture_must_stop());
if (irqreader && cur_ops->irq_capable) {
del_timer_sync(&t);
destroy_timer_on_stack(&t);
}
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_clear_cpu(raw_smp_processor_id(),
+ TICK_DEP_MASK_RCU);
torture_kthread_stopping("rcu_torture_reader");
return 0;
}
@@ -1729,10 +1736,12 @@ static void rcu_torture_fwd_prog_cond_resched(unsigned long iter)
// Real call_rcu() floods hit userspace, so emulate that.
if (need_resched() || (iter & 0xfff))
schedule();
- } else {
- // No userspace emulation: CB invocation throttles call_rcu()
- cond_resched();
+ return;
}
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_set_cpu(smp_processor_id(), TICK_DEP_MASK_RCU);
+ // No userspace emulation: CB invocation throttles call_rcu()
+ cond_resched();
}

/*
@@ -1760,6 +1769,9 @@ static unsigned long rcu_torture_fwd_prog_cbfree(void)
freed++;
rcu_torture_fwd_prog_cond_resched(freed);
}
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_clear_cpu(raw_smp_processor_id(),
+ TICK_DEP_MASK_RCU);
return freed;
}

@@ -1826,6 +1838,9 @@ static void rcu_torture_fwd_prog_nr(int *tested, int *tested_tries)
destroy_rcu_head_on_stack(&fcs.rh);
}
schedule_timeout_uninterruptible(HZ / 10); /* Let kthreads recover. */
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_clear_cpu(raw_smp_processor_id(),
+ TICK_DEP_MASK_RCU);
WRITE_ONCE(rcu_fwd_cb_nodelay, false);
}

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 256622168356..4f0eb24d986e 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2178,6 +2178,10 @@ static void rcu_do_batch(struct rcu_data *rdp)
cond_resched_tasks_rcu_qs();
lockdep_assert_irqs_enabled();
local_bh_disable();
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL) &&
+ !likely(-rcl.len & 31))
+ tick_nohz_dep_set_cpu(smp_processor_id(),
+ TICK_DEP_MASK_RCU);
}
}

@@ -2217,6 +2221,9 @@ static void rcu_do_batch(struct rcu_data *rdp)
/* Re-invoke RCU core processing if there are callbacks remaining. */
if (!offloaded && rcu_segcblist_ready_cbs(&rdp->cblist))
invoke_rcu_core();
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_clear_cpu(raw_smp_processor_id(),
+ TICK_DEP_MASK_RCU);
}

/*
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 379cb7e50a62..7d285adc34c6 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -329,6 +329,8 @@ void rcu_note_context_switch(bool preempt)
rcu_qs();
if (rdp->exp_deferred_qs)
rcu_report_exp_rdp(rdp);
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_clear_cpu(smp_processor_id(), TICK_DEP_MASK_RCU);
trace_rcu_utilization(TPS("End context switch"));
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
@@ -853,6 +855,8 @@ void rcu_note_context_switch(bool preempt)
if (!preempt)
rcu_tasks_qs(current);
out:
+ if (IS_ENABLED(CONFIG_NO_HZ_FULL))
+ tick_nohz_dep_clear_cpu(smp_processor_id(), TICK_DEP_MASK_RCU);
trace_rcu_utilization(TPS("End context switch"));
}
EXPORT_SYMBOL_GPL(rcu_note_context_switch);
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index f4ee1a3428ae..e617421a2c97 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -324,6 +324,7 @@ void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit)
preempt_enable();
}
}
+EXPORT_SYMBOL_GPL(tick_nohz_dep_set_cpu);

void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)
{
@@ -331,6 +332,7 @@ void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit)

atomic_andnot(BIT(bit), &ts->tick_dep_mask);
}
+EXPORT_SYMBOL_GPL(tick_nohz_dep_clear_cpu);

/*
* Set a per-task tick dependency. Posix CPU timers need this in order to elapse

2019-07-30 18:06:20

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Mon, Jul 29, 2019 at 03:32:38PM -0700, Paul E. McKenney wrote:
> On Wed, Jul 24, 2019 at 06:12:43PM -0700, Paul E. McKenney wrote:
>
> The patch below (which includes your patch) does help considerably.
> However, it does have some shortcomings:
>
> 1. Adds an atomic operation (albeit a cache-local one) to
> the scheduler fastpath. One approach would be to have
> a way of testing this bit and clearing it only if set.
>
> Another approach would be to instead clear it on the
> transition to nohz_full userspace or to idle.

Well, the latter would be costly as it is going to restart the tick on every
user -> kernel transitions.

>
> 2. There are a lot of other places in the kernel that are in
> need of this bit being set. I am therefore considering making
> multi_cpu_stop() or its callers set this bit on all CPUs upon
> entry and clear it upon exit. While in this state, it is
> likely necessary to disable clearing this bit. Or it would
> be necessary to make multi_cpu_stop() repeat clearing the bit
> every so often.
>
> As it stands, I have CPU hotplug removal operations taking
> more than 400 seconds.
>
> 3. It was tempting to ask for this bit to be tracked on a per-task
> basis, but from what I can see that adds at least as much
> complexity as it removes.

Yeah I forgot to answer, you can use tick_dep_set_task() for that.

>
> Thoughts?
>
> Thanx, Paul
>
> PS. Outage on @linux.ibm.com, hence the CC of my gmail address.
>
> ------------------------------------------------------------------------
>
> diff --git a/include/linux/tick.h b/include/linux/tick.h
> index 196a0a7bfc4f..0dea6fb33a11 100644
> --- a/include/linux/tick.h
> +++ b/include/linux/tick.h
> @@ -108,7 +108,8 @@ enum tick_dep_bits {
> TICK_DEP_BIT_POSIX_TIMER = 0,
> TICK_DEP_BIT_PERF_EVENTS = 1,
> TICK_DEP_BIT_SCHED = 2,
> - TICK_DEP_BIT_CLOCK_UNSTABLE = 3
> + TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
> + TICK_DEP_BIT_RCU = 4
> };
>
> #define TICK_DEP_MASK_NONE 0
> @@ -116,6 +117,7 @@ enum tick_dep_bits {
> #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
> #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
> #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
> +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
>
> #ifdef CONFIG_NO_HZ_COMMON
> extern bool tick_nohz_enabled;
> @@ -258,6 +260,9 @@ static inline bool tick_nohz_full_enabled(void) { return false; }
> static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
>
> +static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
> +static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { }

And I gave you the wrong APIs. Please consider_using tick_dep_set_cpu()
and tick_dep_clear_cpu() that first check if the CPU uses nohz_full.

Those should have the !CONFIG_NO_HZ_FULL stub implemented as well.

Thanks.

2019-07-30 18:13:48

by Paul E. McKenney

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Tue, Jul 30, 2019 at 06:43:10PM +0200, Frederic Weisbecker wrote:
> On Mon, Jul 29, 2019 at 03:32:38PM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 24, 2019 at 06:12:43PM -0700, Paul E. McKenney wrote:
> >
> > The patch below (which includes your patch) does help considerably.
> > However, it does have some shortcomings:
> >
> > 1. Adds an atomic operation (albeit a cache-local one) to
> > the scheduler fastpath. One approach would be to have
> > a way of testing this bit and clearing it only if set.
> >
> > Another approach would be to instead clear it on the
> > transition to nohz_full userspace or to idle.
>
> Well, the latter would be costly as it is going to restart the tick on every
> user -> kernel transitions.

You lost me on this one. I would be turning off RCU's request to
maintain the tick on transition to nohz_full userspace or to idle.
Why would the tick get turned on by a later user->kernel transition?

> > 2. There are a lot of other places in the kernel that are in
> > need of this bit being set. I am therefore considering making
> > multi_cpu_stop() or its callers set this bit on all CPUs upon
> > entry and clear it upon exit. While in this state, it is
> > likely necessary to disable clearing this bit. Or it would
> > be necessary to make multi_cpu_stop() repeat clearing the bit
> > every so often.
> >
> > As it stands, I have CPU hotplug removal operations taking
> > more than 400 seconds.
> >
> > 3. It was tempting to ask for this bit to be tracked on a per-task
> > basis, but from what I can see that adds at least as much
> > complexity as it removes.
>
> Yeah I forgot to answer, you can use tick_dep_set_task() for that.

Ah, good, that would remove my need to clear things on the scheduler
fastpaths. My guess is that I use both the per-CPU and the per-task
variant in different places, but testing will tell! ;-)

Thank you!

Thanx, Paul

> > Thoughts?
> >
> > Thanx, Paul
> >
> > PS. Outage on @linux.ibm.com, hence the CC of my gmail address.
> >
> > ------------------------------------------------------------------------
> >
> > diff --git a/include/linux/tick.h b/include/linux/tick.h
> > index 196a0a7bfc4f..0dea6fb33a11 100644
> > --- a/include/linux/tick.h
> > +++ b/include/linux/tick.h
> > @@ -108,7 +108,8 @@ enum tick_dep_bits {
> > TICK_DEP_BIT_POSIX_TIMER = 0,
> > TICK_DEP_BIT_PERF_EVENTS = 1,
> > TICK_DEP_BIT_SCHED = 2,
> > - TICK_DEP_BIT_CLOCK_UNSTABLE = 3
> > + TICK_DEP_BIT_CLOCK_UNSTABLE = 3,
> > + TICK_DEP_BIT_RCU = 4
> > };
> >
> > #define TICK_DEP_MASK_NONE 0
> > @@ -116,6 +117,7 @@ enum tick_dep_bits {
> > #define TICK_DEP_MASK_PERF_EVENTS (1 << TICK_DEP_BIT_PERF_EVENTS)
> > #define TICK_DEP_MASK_SCHED (1 << TICK_DEP_BIT_SCHED)
> > #define TICK_DEP_MASK_CLOCK_UNSTABLE (1 << TICK_DEP_BIT_CLOCK_UNSTABLE)
> > +#define TICK_DEP_MASK_RCU (1 << TICK_DEP_BIT_RCU)
> >
> > #ifdef CONFIG_NO_HZ_COMMON
> > extern bool tick_nohz_enabled;
> > @@ -258,6 +260,9 @@ static inline bool tick_nohz_full_enabled(void) { return false; }
> > static inline bool tick_nohz_full_cpu(int cpu) { return false; }
> > static inline void tick_nohz_full_add_cpus_to(struct cpumask *mask) { }
> >
> > +static inline void tick_nohz_dep_set_cpu(int cpu, enum tick_dep_bits bit) { }
> > +static inline void tick_nohz_dep_clear_cpu(int cpu, enum tick_dep_bits bit) { }
>
> And I gave you the wrong APIs. Please consider_using tick_dep_set_cpu()
> and tick_dep_clear_cpu() that first check if the CPU uses nohz_full.
>
> Those should have the !CONFIG_NO_HZ_FULL stub implemented as well.
>
> Thanks.
>

2019-08-19 02:16:45

by Paul E. McKenney

[permalink] [raw]
Subject: Re: How to turn scheduler tick on for current nohz_full CPU?

On Tue, Jul 30, 2019 at 10:36:37AM -0700, Paul E. McKenney wrote:
> On Tue, Jul 30, 2019 at 06:43:10PM +0200, Frederic Weisbecker wrote:
> > On Mon, Jul 29, 2019 at 03:32:38PM -0700, Paul E. McKenney wrote:
> > > On Wed, Jul 24, 2019 at 06:12:43PM -0700, Paul E. McKenney wrote:
> > >
> > > The patch below (which includes your patch) does help considerably.
> > > However, it does have some shortcomings:
> > >
> > > 1. Adds an atomic operation (albeit a cache-local one) to
> > > the scheduler fastpath. One approach would be to have
> > > a way of testing this bit and clearing it only if set.
> > >
> > > Another approach would be to instead clear it on the
> > > transition to nohz_full userspace or to idle.
> >
> > Well, the latter would be costly as it is going to restart the tick on every
> > user -> kernel transitions.
>
> You lost me on this one. I would be turning off RCU's request to
> maintain the tick on transition to nohz_full userspace or to idle.
> Why would the tick get turned on by a later user->kernel transition?
>
> > > 2. There are a lot of other places in the kernel that are in
> > > need of this bit being set. I am therefore considering making
> > > multi_cpu_stop() or its callers set this bit on all CPUs upon
> > > entry and clear it upon exit. While in this state, it is
> > > likely necessary to disable clearing this bit. Or it would
> > > be necessary to make multi_cpu_stop() repeat clearing the bit
> > > every so often.
> > >
> > > As it stands, I have CPU hotplug removal operations taking
> > > more than 400 seconds.
> > >
> > > 3. It was tempting to ask for this bit to be tracked on a per-task
> > > basis, but from what I can see that adds at least as much
> > > complexity as it removes.
> >
> > Yeah I forgot to answer, you can use tick_dep_set_task() for that.
>
> Ah, good, that would remove my need to clear things on the scheduler
> fastpaths. My guess is that I use both the per-CPU and the per-task
> variant in different places, but testing will tell! ;-)

And after a few bug fixes (one key fix for an embarrassing bug from Joel),
this now passes 14-hour rcutorture on the scenario that used to get RCU
CPU stall warnings once or twice per hour. Though still not material for
the upcoming merge window.

Thank you, everyone!

Thanx, Paul