2021-02-24 14:49:01

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 0/7 v4] move update blocked load outside newidle_balance

Joel reported long preempt and irq off sequence in newidle_balance because
of a large number of CPU cgroups in use and having to be updated. This
patchset moves the update outside newidle_imblance. This enables to early
abort during the updates in case of pending irq as an example.

Instead of kicking a normal ILB that will wakes up CPU which is already
idle, patch 6 triggers the update of statistics in the idle thread of
the CPU before selecting and entering an idle state.

Changes on v4:
- Add a dedicated bit for updating blocked load when entering idle.
This simplifies the management of concurrency with kick_ilb.

Changes on v3:
- Fixed a compilation error for !CONFIG_SMP && CONFIG_NO_HZ_COMMON
reported by kernel test robot <[email protected]>
- Took advantage of this new version to add a short desciption for
nohz_run_idle_balance

Changes on v2:
- Fixed some typos and updated some comments
- Added more cleanup
- Changed to way to trigger ILB in idle thread context to remove a possible
race condition between the normal softirq ILB and this new mecanism. The
cpu can already be set in idle_cpus_mask because even if the cpu is added
later when entering idle, it might not have been removed yet from previous
idle phase.

Vincent Guittot (7):
sched/fair: remove update of blocked load from newidle_balance
sched/fair: remove unused return of _nohz_idle_balance
sched/fair: remove unused parameter of update_nohz_stats
sched/fair: merge for each idle cpu loop of ILB
sched/fair: reorder newidle_balance pulled_task tests
sched/fair: trigger the update of blocked load on newly idle cpu
sched/fair: reduce the window for duplicated update

kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 118 +++++++++++++++++--------------------------
kernel/sched/idle.c | 6 +++
kernel/sched/sched.h | 7 +++
4 files changed, 60 insertions(+), 73 deletions(-)

--
2.17.1


2021-02-24 14:53:46

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 6/7 v4] sched/fair: trigger the update of blocked load on newly idle cpu

Instead of waking up a random and already idle CPU, we can take advantage
of this_cpu being about to enter idle to run the ILB and update the
blocked load.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 24 +++++++++++++++++++++---
kernel/sched/idle.c | 6 ++++++
kernel/sched/sched.h | 7 +++++++
4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 88a2e2bdbabe..61ec83e52a08 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -737,7 +737,7 @@ static void nohz_csd_func(void *info)
/*
* Release the rq::nohz_csd.
*/
- flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
+ flags = atomic_fetch_andnot(NOHZ_KICK_MASK | NOHZ_NEWILB_KICK, nohz_flags(cpu));
WARN_ON(!(flags & NOHZ_KICK_MASK));

rq->idle_balance = idle_cpu(cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 586f6ce0d302..46c220a4f7ed 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
return true;
}

+/*
+ * Check if we need to run the ILB for updating blocked load before entering
+ * idle state.
+ */
+void nohz_run_idle_balance(int cpu)
+{
+ unsigned int flags;
+
+ flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
+
+ /*
+ * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
+ * (ie NOHZ_STATS_KICK set) and will do the same.
+ */
+ if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
+ _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE);
+}
+
static void nohz_newidle_balance(struct rq *this_rq)
{
int this_cpu = this_rq->cpu;
@@ -10474,10 +10492,10 @@ static void nohz_newidle_balance(struct rq *this_rq)
return;

/*
- * Blocked load of idle CPUs need to be updated.
- * Kick an ILB to update statistics.
+ * Set the need to trigger ILB in order to update blocked load
+ * before entering idle state.
*/
- kick_ilb(NOHZ_STATS_KICK);
+ atomic_or(NOHZ_NEWILB_KICK, nohz_flags(this_cpu));
}

#else /* !CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 7199e6f23789..7a92d6054aba 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -261,6 +261,12 @@ static void cpuidle_idle_call(void)
static void do_idle(void)
{
int cpu = smp_processor_id();
+
+ /*
+ * Check if we need to update blocked load
+ */
+ nohz_run_idle_balance(cpu);
+
/*
* If the arch has a polling bit, we maintain an invariant:
*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10a1522b1e30..0ddc9a6ff03a 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2385,9 +2385,11 @@ extern void cfs_bandwidth_usage_dec(void);
#ifdef CONFIG_NO_HZ_COMMON
#define NOHZ_BALANCE_KICK_BIT 0
#define NOHZ_STATS_KICK_BIT 1
+#define NOHZ_NEWILB_KICK_BIT 2

#define NOHZ_BALANCE_KICK BIT(NOHZ_BALANCE_KICK_BIT)
#define NOHZ_STATS_KICK BIT(NOHZ_STATS_KICK_BIT)
+#define NOHZ_NEWILB_KICK BIT(NOHZ_NEWILB_KICK_BIT)

#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)

@@ -2398,6 +2400,11 @@ extern void nohz_balance_exit_idle(struct rq *rq);
static inline void nohz_balance_exit_idle(struct rq *rq) { }
#endif

+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern void nohz_run_idle_balance(int cpu);
+#else
+static inline void nohz_run_idle_balance(int cpu) { }
+#endif

#ifdef CONFIG_SMP
static inline
--
2.17.1

2021-02-24 14:53:56

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 1/7 v4] sched/fair: remove update of blocked load from newidle_balance

newidle_balance runs with both preempt and irq disabled which prevent
local irq to run during this period. The duration for updating the
blocked load of CPUs varies according to the number of CPU cgroups
with non-decayed load and extends this critical period to an uncontrolled
level.

Remove the update from newidle_balance and trigger a normal ILB that
will take care of the update instead.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 33 +++++----------------------------
1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a8bd7b13634..0d45b7716384 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7392,8 +7392,6 @@ enum migration_type {
#define LBF_NEED_BREAK 0x02
#define LBF_DST_PINNED 0x04
#define LBF_SOME_PINNED 0x08
-#define LBF_NOHZ_STATS 0x10
-#define LBF_NOHZ_AGAIN 0x20

struct lb_env {
struct sched_domain *sd;
@@ -8397,9 +8395,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
for_each_cpu_and(i, sched_group_span(group), env->cpus) {
struct rq *rq = cpu_rq(i);

- if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false))
- env->flags |= LBF_NOHZ_AGAIN;
-
sgs->group_load += cpu_load(rq);
sgs->group_util += cpu_util(i);
sgs->group_runnable += cpu_runnable(rq);
@@ -8940,11 +8935,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
struct sg_lb_stats tmp_sgs;
int sg_status = 0;

-#ifdef CONFIG_NO_HZ_COMMON
- if (env->idle == CPU_NEWLY_IDLE && READ_ONCE(nohz.has_blocked))
- env->flags |= LBF_NOHZ_STATS;
-#endif
-
do {
struct sg_lb_stats *sgs = &tmp_sgs;
int local_group;
@@ -8981,14 +8971,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
/* Tag domain that child domain prefers tasks go to siblings first */
sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;

-#ifdef CONFIG_NO_HZ_COMMON
- if ((env->flags & LBF_NOHZ_AGAIN) &&
- cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
-
- WRITE_ONCE(nohz.next_blocked,
- jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));
- }
-#endif

if (env->sd->flags & SD_NUMA)
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
@@ -10517,16 +10499,11 @@ static void nohz_newidle_balance(struct rq *this_rq)
time_before(jiffies, READ_ONCE(nohz.next_blocked)))
return;

- raw_spin_unlock(&this_rq->lock);
/*
- * This CPU is going to be idle and blocked load of idle CPUs
- * need to be updated. Run the ilb locally as it is a good
- * candidate for ilb instead of waking up another idle CPU.
- * Kick an normal ilb if we failed to do the update.
+ * Blocked load of idle CPUs need to be updated.
+ * Kick an ILB to update statistics.
*/
- if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
- kick_ilb(NOHZ_STATS_KICK);
- raw_spin_lock(&this_rq->lock);
+ kick_ilb(NOHZ_STATS_KICK);
}

#else /* !CONFIG_NO_HZ_COMMON */
@@ -10587,8 +10564,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
update_next_balance(sd, &next_balance);
rcu_read_unlock();

- nohz_newidle_balance(this_rq);
-
goto out;
}

@@ -10654,6 +10629,8 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)

if (pulled_task)
this_rq->idle_stamp = 0;
+ else
+ nohz_newidle_balance(this_rq);

rq_repin_lock(this_rq, rf);

--
2.17.1

2021-02-24 14:55:50

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 2/7 v4] sched/fair: remove unused return of _nohz_idle_balance

The return of _nohz_idle_balance() is not used anymore so we can remove
it

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0d45b7716384..e23709f6854b 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10354,10 +10354,8 @@ void nohz_balance_enter_idle(int cpu)
* Internal function that runs load balance for all idle cpus. The load balance
* can be a simple update of blocked load or a complete load balance with
* tasks movement depending of flags.
- * The function returns false if the loop has stopped before running
- * through all idle CPUs.
*/
-static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
+static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
enum cpu_idle_type idle)
{
/* Earliest time when we have to do rebalance again */
@@ -10367,7 +10365,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
int update_next_balance = 0;
int this_cpu = this_rq->cpu;
int balance_cpu;
- int ret = false;
struct rq *rq;

SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
@@ -10447,15 +10444,10 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

- /* The full idle balance loop has been done */
- ret = true;
-
abort:
/* There is still blocked load, enable periodic update */
if (has_blocked_load)
WRITE_ONCE(nohz.has_blocked, 1);
-
- return ret;
}

/*
--
2.17.1

2021-02-24 14:56:59

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 7/7 v4] sched/fair: reduce the window for duplicated update

Start to update last_blocked_load_update_tick to reduce the possibility
of another cpu starting the update one more time

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 46c220a4f7ed..38a1297edd76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7852,16 +7852,20 @@ static inline bool others_have_blocked(struct rq *rq)
return false;
}

-static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+static inline void update_blocked_load_tick(struct rq *rq)
{
- rq->last_blocked_load_update_tick = jiffies;
+ WRITE_ONCE(rq->last_blocked_load_update_tick, jiffies);
+}

+static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+{
if (!has_blocked)
rq->has_blocked_load = 0;
}
#else
static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq) { return false; }
static inline bool others_have_blocked(struct rq *rq) { return false; }
+static inline void update_blocked_load_tick(struct rq *rq) {}
static inline void update_blocked_load_status(struct rq *rq, bool has_blocked) {}
#endif

@@ -8022,6 +8026,7 @@ static void update_blocked_averages(int cpu)
struct rq_flags rf;

rq_lock_irqsave(rq, &rf);
+ update_blocked_load_tick(rq);
update_rq_clock(rq);

decayed |= __update_blocked_others(rq, &done);
@@ -8363,7 +8368,7 @@ static bool update_nohz_stats(struct rq *rq)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, READ_ONCE(rq->last_blocked_load_update_tick)))
return true;

update_blocked_averages(cpu);
--
2.17.1

2021-02-24 14:58:10

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 3/7 v4] sched/fair: remove unused parameter of update_nohz_stats

idle load balance is the only user of update_nohz_stats and doesn't use
force parameter. Remove it

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e23709f6854b..f52f4dd3fb9e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8352,7 +8352,7 @@ group_type group_classify(unsigned int imbalance_pct,
return group_has_spare;
}

-static bool update_nohz_stats(struct rq *rq, bool force)
+static bool update_nohz_stats(struct rq *rq)
{
#ifdef CONFIG_NO_HZ_COMMON
unsigned int cpu = rq->cpu;
@@ -8363,7 +8363,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, rq->last_blocked_load_update_tick))
return true;

update_blocked_averages(cpu);
@@ -10401,7 +10401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,

rq = cpu_rq(balance_cpu);

- has_blocked_load |= update_nohz_stats(rq, true);
+ has_blocked_load |= update_nohz_stats(rq);

/*
* If time for next balance is due,
--
2.17.1

2021-02-24 14:58:28

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 4/7 v4] sched/fair: merge for each idle cpu loop of ILB

Remove the specific case for handling this_cpu outside for_each_cpu() loop
when running ILB. Instead we use for_each_cpu_wrap() and start with the
next cpu after this_cpu so we will continue to finish with this_cpu.

update_nohz_stats() is now used for this_cpu too and will prevents
unnecessary update. We don't need a special case for handling the update of
nohz.next_balance for this_cpu anymore because it is now handled by the
loop like others.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 32 +++++++-------------------------
1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f52f4dd3fb9e..0323fda07682 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10043,22 +10043,9 @@ static void rebalance_domains(struct rq *rq, enum cpu_idle_type idle)
* When the cpu is attached to null domain for ex, it will not be
* updated.
*/
- if (likely(update_next_balance)) {
+ if (likely(update_next_balance))
rq->next_balance = next_balance;

-#ifdef CONFIG_NO_HZ_COMMON
- /*
- * If this CPU has been elected to perform the nohz idle
- * balance. Other idle CPUs have already rebalanced with
- * nohz_idle_balance() and nohz.next_balance has been
- * updated accordingly. This CPU is now running the idle load
- * balance for itself and we need to update the
- * nohz.next_balance accordingly.
- */
- if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
- nohz.next_balance = rq->next_balance;
-#endif
- }
}

static inline int on_null_domain(struct rq *rq)
@@ -10385,8 +10372,12 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
*/
smp_mb();

- for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
- if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
+ /*
+ * Start with the next CPU after this_cpu so we will end with this_cpu and let a
+ * chance for other idle cpu to pull load.
+ */
+ for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1) {
+ if (!idle_cpu(balance_cpu))
continue;

/*
@@ -10432,15 +10423,6 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
if (likely(update_next_balance))
nohz.next_balance = next_balance;

- /* Newly idle CPU doesn't need an update */
- if (idle != CPU_NEWLY_IDLE) {
- update_blocked_averages(this_cpu);
- has_blocked_load |= this_rq->has_blocked_load;
- }
-
- if (flags & NOHZ_BALANCE_KICK)
- rebalance_domains(this_rq, CPU_IDLE);
-
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

--
2.17.1

2021-02-24 14:58:49

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH 5/7 v4] sched/fair: reorder newidle_balance pulled_task tests

Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0323fda07682..586f6ce0d302 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10584,7 +10584,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (curr_cost > this_rq->max_idle_balance_cost)
this_rq->max_idle_balance_cost = curr_cost;

-out:
/*
* While browsing the domains, we released the rq lock, a task could
* have been enqueued in the meantime. Since we're not going idle,
@@ -10593,14 +10592,15 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;

- /* Move the next balance forward */
- if (time_after(this_rq->next_balance, next_balance))
- this_rq->next_balance = next_balance;
-
/* Is there a task of a high priority class? */
if (this_rq->nr_running != this_rq->cfs.h_nr_running)
pulled_task = -1;

+out:
+ /* Move the next balance forward */
+ if (time_after(this_rq->next_balance, next_balance))
+ this_rq->next_balance = next_balance;
+
if (pulled_task)
this_rq->idle_stamp = 0;
else
--
2.17.1

2021-02-24 17:45:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On Wed, Feb 24, 2021 at 04:57:15PM +0100, Vincent Guittot wrote:
> On Wed, 24 Feb 2021 at 16:54, Peter Zijlstra <[email protected]> wrote:
> >
> > On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote:
> > > Joel reported long preempt and irq off sequence in newidle_balance because
> > > of a large number of CPU cgroups in use and having to be updated. This
> > > patchset moves the update outside newidle_imblance. This enables to early
> > > abort during the updates in case of pending irq as an example.
> > >
> > > Instead of kicking a normal ILB that will wakes up CPU which is already
> > > idle, patch 6 triggers the update of statistics in the idle thread of
> > > the CPU before selecting and entering an idle state.
> >
> > I'm confused... update_blocked_averages(), which calls
> > __update_blocked_fair(), which is the one doing the cgroup iteration
> > thing, runs with rq->lock held, and thus will have IRQs disabled any
> > which way around we turn this thing.
> >
> > Or is the problem that we called nohz_idle_balance(), which does
> > update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from
> > newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency?
> > Which is now reduced to just NR_CGROUPS ?
>
> Yes we can now abort between each cpu update

OK, shall I add something like:

This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to
O(nr_cgroups).

To the changelog of patch #1 ?

2021-02-24 17:54:20

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On Wed, 24 Feb 2021 at 18:41, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Feb 24, 2021 at 04:57:15PM +0100, Vincent Guittot wrote:
> > On Wed, 24 Feb 2021 at 16:54, Peter Zijlstra <[email protected]> wrote:
> > >
> > > On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote:
> > > > Joel reported long preempt and irq off sequence in newidle_balance because
> > > > of a large number of CPU cgroups in use and having to be updated. This
> > > > patchset moves the update outside newidle_imblance. This enables to early
> > > > abort during the updates in case of pending irq as an example.
> > > >
> > > > Instead of kicking a normal ILB that will wakes up CPU which is already
> > > > idle, patch 6 triggers the update of statistics in the idle thread of
> > > > the CPU before selecting and entering an idle state.
> > >
> > > I'm confused... update_blocked_averages(), which calls
> > > __update_blocked_fair(), which is the one doing the cgroup iteration
> > > thing, runs with rq->lock held, and thus will have IRQs disabled any
> > > which way around we turn this thing.
> > >
> > > Or is the problem that we called nohz_idle_balance(), which does
> > > update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from
> > > newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency?
> > > Which is now reduced to just NR_CGROUPS ?
> >
> > Yes we can now abort between each cpu update
>
> OK, shall I add something like:
>
> This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to
> O(nr_cgroups).
>
> To the changelog of patch #1 ?

Yes, good point. This will clarify the range of improvement

2021-02-24 18:19:13

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On Wed, Feb 24, 2021 at 06:51:01PM +0100, Vincent Guittot wrote:

> > OK, shall I add something like:
> >
> > This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to
> > O(nr_cgroups).
> >
> > To the changelog of patch #1 ?
>
> Yes, good point. This will clarify the range of improvement

OK, done.

2021-02-24 18:50:25

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On 24/02/21 14:30, Vincent Guittot wrote:
> Joel reported long preempt and irq off sequence in newidle_balance because
> of a large number of CPU cgroups in use and having to be updated. This
> patchset moves the update outside newidle_imblance. This enables to early
> abort during the updates in case of pending irq as an example.
>
> Instead of kicking a normal ILB that will wakes up CPU which is already
> idle, patch 6 triggers the update of statistics in the idle thread of
> the CPU before selecting and entering an idle state.
>
> Changes on v4:
> - Add a dedicated bit for updating blocked load when entering idle.
> This simplifies the management of concurrency with kick_ilb.
>

I believe that solves the issues vs nohz balance.

One last thing for patch 7: mayhaps we could do a tad better to avoid
duplicate updates going through a heapful of leaf cfs rqs, see

http://lore.kernel.org/r/[email protected]


Otherwise, feel free to add to the lot:

Reviewed-by: Valentin Schneider <[email protected]>

2021-02-25 01:02:00

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On Wed, 24 Feb 2021 at 16:54, Peter Zijlstra <[email protected]> wrote:
>
> On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote:
> > Joel reported long preempt and irq off sequence in newidle_balance because
> > of a large number of CPU cgroups in use and having to be updated. This
> > patchset moves the update outside newidle_imblance. This enables to early
> > abort during the updates in case of pending irq as an example.
> >
> > Instead of kicking a normal ILB that will wakes up CPU which is already
> > idle, patch 6 triggers the update of statistics in the idle thread of
> > the CPU before selecting and entering an idle state.
>
> I'm confused... update_blocked_averages(), which calls
> __update_blocked_fair(), which is the one doing the cgroup iteration
> thing, runs with rq->lock held, and thus will have IRQs disabled any
> which way around we turn this thing.
>
> Or is the problem that we called nohz_idle_balance(), which does
> update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from
> newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency?
> Which is now reduced to just NR_CGROUPS ?

Yes we can now abort between each cpu update

2021-02-25 01:03:23

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On Wed, Feb 24, 2021 at 02:30:00PM +0100, Vincent Guittot wrote:
> Joel reported long preempt and irq off sequence in newidle_balance because
> of a large number of CPU cgroups in use and having to be updated. This
> patchset moves the update outside newidle_imblance. This enables to early
> abort during the updates in case of pending irq as an example.
>
> Instead of kicking a normal ILB that will wakes up CPU which is already
> idle, patch 6 triggers the update of statistics in the idle thread of
> the CPU before selecting and entering an idle state.

I'm confused... update_blocked_averages(), which calls
__update_blocked_fair(), which is the one doing the cgroup iteration
thing, runs with rq->lock held, and thus will have IRQs disabled any
which way around we turn this thing.

Or is the problem that we called nohz_idle_balance(), which does
update_nohz_stats() -> update_blocked_averages() for evey NOHZ cpu from
newidle balance, such that we get NR_NOHZ_CPUS * NR_CGROUPS IRQ latency?
Which is now reduced to just NR_CGROUPS ?

2021-02-25 09:50:31

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

Hi Valentin,

On Wed, 24 Feb 2021 at 19:46, Valentin Schneider
<[email protected]> wrote:
>
> On 24/02/21 14:30, Vincent Guittot wrote:
> > Joel reported long preempt and irq off sequence in newidle_balance because
> > of a large number of CPU cgroups in use and having to be updated. This
> > patchset moves the update outside newidle_imblance. This enables to early
> > abort during the updates in case of pending irq as an example.
> >
> > Instead of kicking a normal ILB that will wakes up CPU which is already
> > idle, patch 6 triggers the update of statistics in the idle thread of
> > the CPU before selecting and entering an idle state.
> >
> > Changes on v4:
> > - Add a dedicated bit for updating blocked load when entering idle.
> > This simplifies the management of concurrency with kick_ilb.
> >
>
> I believe that solves the issues vs nohz balance.
>
> One last thing for patch 7: mayhaps we could do a tad better to avoid
> duplicate updates going through a heapful of leaf cfs rqs, see
>
> http://lore.kernel.org/r/[email protected]

rq->last_blocked_load_update_tick is there only to filter duplicate
update during _nohz_idle_balance but not for other normal LB.

>
>
> Otherwise, feel free to add to the lot:
>
> Reviewed-by: Valentin Schneider <[email protected]>
>

2021-02-25 12:19:28

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 0/7 v4] move update blocked load outside newidle_balance

On 25/02/21 09:05, Vincent Guittot wrote:
>> One last thing for patch 7: mayhaps we could do a tad better to avoid
>> duplicate updates going through a heapful of leaf cfs rqs, see
>>
>> http://lore.kernel.org/r/[email protected]
>
> rq->last_blocked_load_update_tick is there only to filter duplicate
> update during _nohz_idle_balance but not for other normal LB.
>

Right, update_nohz_stats() is now only used in _nohz_idle_balance()...
IIUC the pattern being covered here would be a CPUX getting kicked for nohz
stats/balance, then some CPUX-N (so it gets picked by find_new_ilb())
entering idle and getting kick in turn (less than a jiffy apart).

2021-03-02 21:21:15

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Trigger the update of blocked load on newly idle cpu

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 1705e3b449f62d957e897239ef6c67ca574acfc6
Gitweb: https://git.kernel.org/tip/1705e3b449f62d957e897239ef6c67ca574acfc6
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:06 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:24 +01:00

sched/fair: Trigger the update of blocked load on newly idle cpu

Instead of waking up a random and already idle CPU, we can take advantage
of this_cpu being about to enter idle to run the ILB and update the
blocked load.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 24 +++++++++++++++++++++---
kernel/sched/idle.c | 6 ++++++
kernel/sched/sched.h | 7 +++++++
4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9dfb34..361974e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -737,7 +737,7 @@ static void nohz_csd_func(void *info)
/*
* Release the rq::nohz_csd.
*/
- flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
+ flags = atomic_fetch_andnot(NOHZ_KICK_MASK | NOHZ_NEWILB_KICK, nohz_flags(cpu));
WARN_ON(!(flags & NOHZ_KICK_MASK));

rq->idle_balance = idle_cpu(cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 356a245..e87e1b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
return true;
}

+/*
+ * Check if we need to run the ILB for updating blocked load before entering
+ * idle state.
+ */
+void nohz_run_idle_balance(int cpu)
+{
+ unsigned int flags;
+
+ flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
+
+ /*
+ * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
+ * (ie NOHZ_STATS_KICK set) and will do the same.
+ */
+ if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
+ _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE);
+}
+
static void nohz_newidle_balance(struct rq *this_rq)
{
int this_cpu = this_rq->cpu;
@@ -10474,10 +10492,10 @@ static void nohz_newidle_balance(struct rq *this_rq)
return;

/*
- * Blocked load of idle CPUs need to be updated.
- * Kick an ILB to update statistics.
+ * Set the need to trigger ILB in order to update blocked load
+ * before entering idle state.
*/
- kick_ilb(NOHZ_STATS_KICK);
+ atomic_or(NOHZ_NEWILB_KICK, nohz_flags(this_cpu));
}

#else /* !CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 7199e6f..7a92d60 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -261,6 +261,12 @@ exit_idle:
static void do_idle(void)
{
int cpu = smp_processor_id();
+
+ /*
+ * Check if we need to update blocked load
+ */
+ nohz_run_idle_balance(cpu);
+
/*
* If the arch has a polling bit, we maintain an invariant:
*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10a1522..0ddc9a6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2385,9 +2385,11 @@ extern void cfs_bandwidth_usage_dec(void);
#ifdef CONFIG_NO_HZ_COMMON
#define NOHZ_BALANCE_KICK_BIT 0
#define NOHZ_STATS_KICK_BIT 1
+#define NOHZ_NEWILB_KICK_BIT 2

#define NOHZ_BALANCE_KICK BIT(NOHZ_BALANCE_KICK_BIT)
#define NOHZ_STATS_KICK BIT(NOHZ_STATS_KICK_BIT)
+#define NOHZ_NEWILB_KICK BIT(NOHZ_NEWILB_KICK_BIT)

#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)

@@ -2398,6 +2400,11 @@ extern void nohz_balance_exit_idle(struct rq *rq);
static inline void nohz_balance_exit_idle(struct rq *rq) { }
#endif

+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern void nohz_run_idle_balance(int cpu);
+#else
+static inline void nohz_run_idle_balance(int cpu) { }
+#endif

#ifdef CONFIG_SMP
static inline

2021-03-02 21:26:28

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove unused return of _nohz_idle_balance

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 81df323258719a0194fadbf4aa93e213a552e460
Gitweb: https://git.kernel.org/tip/81df323258719a0194fadbf4aa93e213a552e460
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:02 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:23 +01:00

sched/fair: Remove unused return of _nohz_idle_balance

The return of _nohz_idle_balance() is not used anymore so we can remove
it

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 806e16f..6a458e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10354,10 +10354,8 @@ out:
* Internal function that runs load balance for all idle cpus. The load balance
* can be a simple update of blocked load or a complete load balance with
* tasks movement depending of flags.
- * The function returns false if the loop has stopped before running
- * through all idle CPUs.
*/
-static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
+static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
enum cpu_idle_type idle)
{
/* Earliest time when we have to do rebalance again */
@@ -10367,7 +10365,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
int update_next_balance = 0;
int this_cpu = this_rq->cpu;
int balance_cpu;
- int ret = false;
struct rq *rq;

SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
@@ -10447,15 +10444,10 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

- /* The full idle balance loop has been done */
- ret = true;
-
abort:
/* There is still blocked load, enable periodic update */
if (has_blocked_load)
WRITE_ONCE(nohz.has_blocked, 1);
-
- return ret;
}

/*

2021-03-02 21:26:29

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove unused parameter of update_nohz_stats

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 8af3f0fbfbaa3b78bb1fc577ee42c3228f3cc822
Gitweb: https://git.kernel.org/tip/8af3f0fbfbaa3b78bb1fc577ee42c3228f3cc822
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:03 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:23 +01:00

sched/fair: Remove unused parameter of update_nohz_stats

idle load balance is the only user of update_nohz_stats and doesn't use
force parameter. Remove it

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a458e9..1b91030 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8352,7 +8352,7 @@ group_type group_classify(unsigned int imbalance_pct,
return group_has_spare;
}

-static bool update_nohz_stats(struct rq *rq, bool force)
+static bool update_nohz_stats(struct rq *rq)
{
#ifdef CONFIG_NO_HZ_COMMON
unsigned int cpu = rq->cpu;
@@ -8363,7 +8363,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, rq->last_blocked_load_update_tick))
return true;

update_blocked_averages(cpu);
@@ -10401,7 +10401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,

rq = cpu_rq(balance_cpu);

- has_blocked_load |= update_nohz_stats(rq, true);
+ has_blocked_load |= update_nohz_stats(rq);

/*
* If time for next balance is due,

2021-03-02 21:36:49

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove update of blocked load from newidle_balance

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 06a35afe89800789fc47ca5c41fbe435cc77d8e0
Gitweb: https://git.kernel.org/tip/06a35afe89800789fc47ca5c41fbe435cc77d8e0
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:01 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:23 +01:00

sched/fair: Remove update of blocked load from newidle_balance

newidle_balance runs with both preempt and irq disabled which prevent
local irq to run during this period. The duration for updating the
blocked load of CPUs varies according to the number of CPU cgroups
with non-decayed load and extends this critical period to an uncontrolled
level.

Remove the update from newidle_balance and trigger a normal ILB that
will take care of the update instead.

This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to
O(nr_cgroups).

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 33 +++++----------------------------
1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb..806e16f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7392,8 +7392,6 @@ enum migration_type {
#define LBF_NEED_BREAK 0x02
#define LBF_DST_PINNED 0x04
#define LBF_SOME_PINNED 0x08
-#define LBF_NOHZ_STATS 0x10
-#define LBF_NOHZ_AGAIN 0x20

struct lb_env {
struct sched_domain *sd;
@@ -8397,9 +8395,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
for_each_cpu_and(i, sched_group_span(group), env->cpus) {
struct rq *rq = cpu_rq(i);

- if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false))
- env->flags |= LBF_NOHZ_AGAIN;
-
sgs->group_load += cpu_load(rq);
sgs->group_util += cpu_util(i);
sgs->group_runnable += cpu_runnable(rq);
@@ -8940,11 +8935,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
struct sg_lb_stats tmp_sgs;
int sg_status = 0;

-#ifdef CONFIG_NO_HZ_COMMON
- if (env->idle == CPU_NEWLY_IDLE && READ_ONCE(nohz.has_blocked))
- env->flags |= LBF_NOHZ_STATS;
-#endif
-
do {
struct sg_lb_stats *sgs = &tmp_sgs;
int local_group;
@@ -8981,14 +8971,6 @@ next_group:
/* Tag domain that child domain prefers tasks go to siblings first */
sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;

-#ifdef CONFIG_NO_HZ_COMMON
- if ((env->flags & LBF_NOHZ_AGAIN) &&
- cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
-
- WRITE_ONCE(nohz.next_blocked,
- jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));
- }
-#endif

if (env->sd->flags & SD_NUMA)
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
@@ -10517,16 +10499,11 @@ static void nohz_newidle_balance(struct rq *this_rq)
time_before(jiffies, READ_ONCE(nohz.next_blocked)))
return;

- raw_spin_unlock(&this_rq->lock);
/*
- * This CPU is going to be idle and blocked load of idle CPUs
- * need to be updated. Run the ilb locally as it is a good
- * candidate for ilb instead of waking up another idle CPU.
- * Kick an normal ilb if we failed to do the update.
+ * Blocked load of idle CPUs need to be updated.
+ * Kick an ILB to update statistics.
*/
- if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
- kick_ilb(NOHZ_STATS_KICK);
- raw_spin_lock(&this_rq->lock);
+ kick_ilb(NOHZ_STATS_KICK);
}

#else /* !CONFIG_NO_HZ_COMMON */
@@ -10587,8 +10564,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
update_next_balance(sd, &next_balance);
rcu_read_unlock();

- nohz_newidle_balance(this_rq);
-
goto out;
}

@@ -10654,6 +10629,8 @@ out:

if (pulled_task)
this_rq->idle_stamp = 0;
+ else
+ nohz_newidle_balance(this_rq);

rq_repin_lock(this_rq, rf);

2021-03-04 06:05:13

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Merge for each idle cpu loop of ILB

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c0d2f3b54ed88ce1079f8ffb094205d3f578a9bb
Gitweb: https://git.kernel.org/tip/c0d2f3b54ed88ce1079f8ffb094205d3f578a9bb
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:04 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:24 +01:00

sched/fair: Merge for each idle cpu loop of ILB

Remove the specific case for handling this_cpu outside for_each_cpu() loop
when running ILB. Instead we use for_each_cpu_wrap() and start with the
next cpu after this_cpu so we will continue to finish with this_cpu.

update_nohz_stats() is now used for this_cpu too and will prevents
unnecessary update. We don't need a special case for handling the update of
nohz.next_balance for this_cpu anymore because it is now handled by the
loop like others.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 32 +++++++-------------------------
1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1b91030..3c00918 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10043,22 +10043,9 @@ out:
* When the cpu is attached to null domain for ex, it will not be
* updated.
*/
- if (likely(update_next_balance)) {
+ if (likely(update_next_balance))
rq->next_balance = next_balance;

-#ifdef CONFIG_NO_HZ_COMMON
- /*
- * If this CPU has been elected to perform the nohz idle
- * balance. Other idle CPUs have already rebalanced with
- * nohz_idle_balance() and nohz.next_balance has been
- * updated accordingly. This CPU is now running the idle load
- * balance for itself and we need to update the
- * nohz.next_balance accordingly.
- */
- if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
- nohz.next_balance = rq->next_balance;
-#endif
- }
}

static inline int on_null_domain(struct rq *rq)
@@ -10385,8 +10372,12 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
*/
smp_mb();

- for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
- if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
+ /*
+ * Start with the next CPU after this_cpu so we will end with this_cpu and let a
+ * chance for other idle cpu to pull load.
+ */
+ for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1) {
+ if (!idle_cpu(balance_cpu))
continue;

/*
@@ -10432,15 +10423,6 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
if (likely(update_next_balance))
nohz.next_balance = next_balance;

- /* Newly idle CPU doesn't need an update */
- if (idle != CPU_NEWLY_IDLE) {
- update_blocked_averages(this_cpu);
- has_blocked_load |= this_rq->has_blocked_load;
- }
-
- if (flags & NOHZ_BALANCE_KICK)
- rebalance_domains(this_rq, CPU_IDLE);
-
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

2021-03-04 06:05:23

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Reduce the window for duplicated update

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 839ffb99d94f930fecbdee2fdfb883b10c30326b
Gitweb: https://git.kernel.org/tip/839ffb99d94f930fecbdee2fdfb883b10c30326b
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:07 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:25 +01:00

sched/fair: Reduce the window for duplicated update

Start to update last_blocked_load_update_tick to reduce the possibility
of another cpu starting the update one more time

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e87e1b3..f1b55f9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7852,16 +7852,20 @@ static inline bool others_have_blocked(struct rq *rq)
return false;
}

-static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+static inline void update_blocked_load_tick(struct rq *rq)
{
- rq->last_blocked_load_update_tick = jiffies;
+ WRITE_ONCE(rq->last_blocked_load_update_tick, jiffies);
+}

+static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+{
if (!has_blocked)
rq->has_blocked_load = 0;
}
#else
static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq) { return false; }
static inline bool others_have_blocked(struct rq *rq) { return false; }
+static inline void update_blocked_load_tick(struct rq *rq) {}
static inline void update_blocked_load_status(struct rq *rq, bool has_blocked) {}
#endif

@@ -8022,6 +8026,7 @@ static void update_blocked_averages(int cpu)
struct rq_flags rf;

rq_lock_irqsave(rq, &rf);
+ update_blocked_load_tick(rq);
update_rq_clock(rq);

decayed |= __update_blocked_others(rq, &done);
@@ -8363,7 +8368,7 @@ static bool update_nohz_stats(struct rq *rq)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, READ_ONCE(rq->last_blocked_load_update_tick)))
return true;

update_blocked_averages(cpu);

2021-03-04 06:06:28

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Reorder newidle_balance pulled_task tests

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 0c067f38e1b9640e9121fb7e0bb38fa8f867a248
Gitweb: https://git.kernel.org/tip/0c067f38e1b9640e9121fb7e0bb38fa8f867a248
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:05 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Mon, 01 Mar 2021 18:17:24 +01:00

sched/fair: Reorder newidle_balance pulled_task tests

Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c00918..356a245 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10584,7 +10584,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (curr_cost > this_rq->max_idle_balance_cost)
this_rq->max_idle_balance_cost = curr_cost;

-out:
/*
* While browsing the domains, we released the rq lock, a task could
* have been enqueued in the meantime. Since we're not going idle,
@@ -10593,14 +10592,15 @@ out:
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;

- /* Move the next balance forward */
- if (time_after(this_rq->next_balance, next_balance))
- this_rq->next_balance = next_balance;
-
/* Is there a task of a high priority class? */
if (this_rq->nr_running != this_rq->cfs.h_nr_running)
pulled_task = -1;

+out:
+ /* Move the next balance forward */
+ if (time_after(this_rq->next_balance, next_balance))
+ this_rq->next_balance = next_balance;
+
if (pulled_task)
this_rq->idle_stamp = 0;
else

2021-03-04 09:10:57

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove unused return of _nohz_idle_balance

The following commit has been merged into the sched/core branch of tip:

Commit-ID: f2c0af1dabdae4674fb7ddba0ac88ca78d0fe675
Gitweb: https://git.kernel.org/tip/f2c0af1dabdae4674fb7ddba0ac88ca78d0fe675
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:02 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Remove unused return of _nohz_idle_balance

The return of _nohz_idle_balance() is not used anymore so we can remove
it

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 806e16f..6a458e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10354,10 +10354,8 @@ out:
* Internal function that runs load balance for all idle cpus. The load balance
* can be a simple update of blocked load or a complete load balance with
* tasks movement depending of flags.
- * The function returns false if the loop has stopped before running
- * through all idle CPUs.
*/
-static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
+static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
enum cpu_idle_type idle)
{
/* Earliest time when we have to do rebalance again */
@@ -10367,7 +10365,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
int update_next_balance = 0;
int this_cpu = this_rq->cpu;
int balance_cpu;
- int ret = false;
struct rq *rq;

SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
@@ -10447,15 +10444,10 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

- /* The full idle balance loop has been done */
- ret = true;
-
abort:
/* There is still blocked load, enable periodic update */
if (has_blocked_load)
WRITE_ONCE(nohz.has_blocked, 1);
-
- return ret;
}

/*

2021-03-04 09:11:12

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Reorder newidle_balance pulled_task tests

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 053192dea58da994fb3dd7ad235440accf292a08
Gitweb: https://git.kernel.org/tip/053192dea58da994fb3dd7ad235440accf292a08
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:05 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Reorder newidle_balance pulled_task tests

Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c00918..356a245 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10584,7 +10584,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (curr_cost > this_rq->max_idle_balance_cost)
this_rq->max_idle_balance_cost = curr_cost;

-out:
/*
* While browsing the domains, we released the rq lock, a task could
* have been enqueued in the meantime. Since we're not going idle,
@@ -10593,14 +10592,15 @@ out:
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;

- /* Move the next balance forward */
- if (time_after(this_rq->next_balance, next_balance))
- this_rq->next_balance = next_balance;
-
/* Is there a task of a high priority class? */
if (this_rq->nr_running != this_rq->cfs.h_nr_running)
pulled_task = -1;

+out:
+ /* Move the next balance forward */
+ if (time_after(this_rq->next_balance, next_balance))
+ this_rq->next_balance = next_balance;
+
if (pulled_task)
this_rq->idle_stamp = 0;
else

2021-03-04 09:16:02

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Merge for each idle cpu loop of ILB

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 2aa7f2f6d1e4308b81bef079091561445b9cb949
Gitweb: https://git.kernel.org/tip/2aa7f2f6d1e4308b81bef079091561445b9cb949
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:04 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Merge for each idle cpu loop of ILB

Remove the specific case for handling this_cpu outside for_each_cpu() loop
when running ILB. Instead we use for_each_cpu_wrap() and start with the
next cpu after this_cpu so we will continue to finish with this_cpu.

update_nohz_stats() is now used for this_cpu too and will prevents
unnecessary update. We don't need a special case for handling the update of
nohz.next_balance for this_cpu anymore because it is now handled by the
loop like others.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 32 +++++++-------------------------
1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1b91030..3c00918 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10043,22 +10043,9 @@ out:
* When the cpu is attached to null domain for ex, it will not be
* updated.
*/
- if (likely(update_next_balance)) {
+ if (likely(update_next_balance))
rq->next_balance = next_balance;

-#ifdef CONFIG_NO_HZ_COMMON
- /*
- * If this CPU has been elected to perform the nohz idle
- * balance. Other idle CPUs have already rebalanced with
- * nohz_idle_balance() and nohz.next_balance has been
- * updated accordingly. This CPU is now running the idle load
- * balance for itself and we need to update the
- * nohz.next_balance accordingly.
- */
- if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
- nohz.next_balance = rq->next_balance;
-#endif
- }
}

static inline int on_null_domain(struct rq *rq)
@@ -10385,8 +10372,12 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
*/
smp_mb();

- for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
- if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
+ /*
+ * Start with the next CPU after this_cpu so we will end with this_cpu and let a
+ * chance for other idle cpu to pull load.
+ */
+ for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1) {
+ if (!idle_cpu(balance_cpu))
continue;

/*
@@ -10432,15 +10423,6 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
if (likely(update_next_balance))
nohz.next_balance = next_balance;

- /* Newly idle CPU doesn't need an update */
- if (idle != CPU_NEWLY_IDLE) {
- update_blocked_averages(this_cpu);
- has_blocked_load |= this_rq->has_blocked_load;
- }
-
- if (flags & NOHZ_BALANCE_KICK)
- rebalance_domains(this_rq, CPU_IDLE);
-
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

2021-03-04 09:21:52

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove update of blocked load from newidle_balance

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 1690607f4232c120a2d6ff1f9d0766551d9609f1
Gitweb: https://git.kernel.org/tip/1690607f4232c120a2d6ff1f9d0766551d9609f1
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:01 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Remove update of blocked load from newidle_balance

newidle_balance runs with both preempt and irq disabled which prevent
local irq to run during this period. The duration for updating the
blocked load of CPUs varies according to the number of CPU cgroups
with non-decayed load and extends this critical period to an uncontrolled
level.

Remove the update from newidle_balance and trigger a normal ILB that
will take care of the update instead.

This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to
O(nr_cgroups).

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 33 +++++----------------------------
1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb..806e16f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7392,8 +7392,6 @@ enum migration_type {
#define LBF_NEED_BREAK 0x02
#define LBF_DST_PINNED 0x04
#define LBF_SOME_PINNED 0x08
-#define LBF_NOHZ_STATS 0x10
-#define LBF_NOHZ_AGAIN 0x20

struct lb_env {
struct sched_domain *sd;
@@ -8397,9 +8395,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
for_each_cpu_and(i, sched_group_span(group), env->cpus) {
struct rq *rq = cpu_rq(i);

- if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false))
- env->flags |= LBF_NOHZ_AGAIN;
-
sgs->group_load += cpu_load(rq);
sgs->group_util += cpu_util(i);
sgs->group_runnable += cpu_runnable(rq);
@@ -8940,11 +8935,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
struct sg_lb_stats tmp_sgs;
int sg_status = 0;

-#ifdef CONFIG_NO_HZ_COMMON
- if (env->idle == CPU_NEWLY_IDLE && READ_ONCE(nohz.has_blocked))
- env->flags |= LBF_NOHZ_STATS;
-#endif
-
do {
struct sg_lb_stats *sgs = &tmp_sgs;
int local_group;
@@ -8981,14 +8971,6 @@ next_group:
/* Tag domain that child domain prefers tasks go to siblings first */
sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;

-#ifdef CONFIG_NO_HZ_COMMON
- if ((env->flags & LBF_NOHZ_AGAIN) &&
- cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
-
- WRITE_ONCE(nohz.next_blocked,
- jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));
- }
-#endif

if (env->sd->flags & SD_NUMA)
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
@@ -10517,16 +10499,11 @@ static void nohz_newidle_balance(struct rq *this_rq)
time_before(jiffies, READ_ONCE(nohz.next_blocked)))
return;

- raw_spin_unlock(&this_rq->lock);
/*
- * This CPU is going to be idle and blocked load of idle CPUs
- * need to be updated. Run the ilb locally as it is a good
- * candidate for ilb instead of waking up another idle CPU.
- * Kick an normal ilb if we failed to do the update.
+ * Blocked load of idle CPUs need to be updated.
+ * Kick an ILB to update statistics.
*/
- if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
- kick_ilb(NOHZ_STATS_KICK);
- raw_spin_lock(&this_rq->lock);
+ kick_ilb(NOHZ_STATS_KICK);
}

#else /* !CONFIG_NO_HZ_COMMON */
@@ -10587,8 +10564,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
update_next_balance(sd, &next_balance);
rcu_read_unlock();

- nohz_newidle_balance(this_rq);
-
goto out;
}

@@ -10654,6 +10629,8 @@ out:

if (pulled_task)
this_rq->idle_stamp = 0;
+ else
+ nohz_newidle_balance(this_rq);

rq_repin_lock(this_rq, rf);

2021-03-04 09:27:01

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove unused parameter of update_nohz_stats

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 21c5d27a4c5d9fddb2c35ccdd5cddc11b75f753d
Gitweb: https://git.kernel.org/tip/21c5d27a4c5d9fddb2c35ccdd5cddc11b75f753d
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:03 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Remove unused parameter of update_nohz_stats

idle load balance is the only user of update_nohz_stats and doesn't use
force parameter. Remove it

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a458e9..1b91030 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8352,7 +8352,7 @@ group_type group_classify(unsigned int imbalance_pct,
return group_has_spare;
}

-static bool update_nohz_stats(struct rq *rq, bool force)
+static bool update_nohz_stats(struct rq *rq)
{
#ifdef CONFIG_NO_HZ_COMMON
unsigned int cpu = rq->cpu;
@@ -8363,7 +8363,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, rq->last_blocked_load_update_tick))
return true;

update_blocked_averages(cpu);
@@ -10401,7 +10401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,

rq = cpu_rq(balance_cpu);

- has_blocked_load |= update_nohz_stats(rq, true);
+ has_blocked_load |= update_nohz_stats(rq);

/*
* If time for next balance is due,

2021-03-04 21:46:57

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Reduce the window for duplicated update

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 780eec5b50930b34e2f096b4dce5368d90497b55
Gitweb: https://git.kernel.org/tip/780eec5b50930b34e2f096b4dce5368d90497b55
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:07 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Reduce the window for duplicated update

Start to update last_blocked_load_update_tick to reduce the possibility
of another cpu starting the update one more time

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e87e1b3..f1b55f9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7852,16 +7852,20 @@ static inline bool others_have_blocked(struct rq *rq)
return false;
}

-static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+static inline void update_blocked_load_tick(struct rq *rq)
{
- rq->last_blocked_load_update_tick = jiffies;
+ WRITE_ONCE(rq->last_blocked_load_update_tick, jiffies);
+}

+static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+{
if (!has_blocked)
rq->has_blocked_load = 0;
}
#else
static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq) { return false; }
static inline bool others_have_blocked(struct rq *rq) { return false; }
+static inline void update_blocked_load_tick(struct rq *rq) {}
static inline void update_blocked_load_status(struct rq *rq, bool has_blocked) {}
#endif

@@ -8022,6 +8026,7 @@ static void update_blocked_averages(int cpu)
struct rq_flags rf;

rq_lock_irqsave(rq, &rf);
+ update_blocked_load_tick(rq);
update_rq_clock(rq);

decayed |= __update_blocked_others(rq, &done);
@@ -8363,7 +8368,7 @@ static bool update_nohz_stats(struct rq *rq)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, READ_ONCE(rq->last_blocked_load_update_tick)))
return true;

update_blocked_averages(cpu);

2021-03-04 21:48:32

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Trigger the update of blocked load on newly idle cpu

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 63dbe695827f0f612a0cdbc82a43a974bcd536cd
Gitweb: https://git.kernel.org/tip/63dbe695827f0f612a0cdbc82a43a974bcd536cd
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:06 +01:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 03 Mar 2021 10:32:59 +01:00

sched/fair: Trigger the update of blocked load on newly idle cpu

Instead of waking up a random and already idle CPU, we can take advantage
of this_cpu being about to enter idle to run the ILB and update the
blocked load.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 24 +++++++++++++++++++++---
kernel/sched/idle.c | 6 ++++++
kernel/sched/sched.h | 7 +++++++
4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9dfb34..361974e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -737,7 +737,7 @@ static void nohz_csd_func(void *info)
/*
* Release the rq::nohz_csd.
*/
- flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
+ flags = atomic_fetch_andnot(NOHZ_KICK_MASK | NOHZ_NEWILB_KICK, nohz_flags(cpu));
WARN_ON(!(flags & NOHZ_KICK_MASK));

rq->idle_balance = idle_cpu(cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 356a245..e87e1b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
return true;
}

+/*
+ * Check if we need to run the ILB for updating blocked load before entering
+ * idle state.
+ */
+void nohz_run_idle_balance(int cpu)
+{
+ unsigned int flags;
+
+ flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
+
+ /*
+ * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
+ * (ie NOHZ_STATS_KICK set) and will do the same.
+ */
+ if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
+ _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE);
+}
+
static void nohz_newidle_balance(struct rq *this_rq)
{
int this_cpu = this_rq->cpu;
@@ -10474,10 +10492,10 @@ static void nohz_newidle_balance(struct rq *this_rq)
return;

/*
- * Blocked load of idle CPUs need to be updated.
- * Kick an ILB to update statistics.
+ * Set the need to trigger ILB in order to update blocked load
+ * before entering idle state.
*/
- kick_ilb(NOHZ_STATS_KICK);
+ atomic_or(NOHZ_NEWILB_KICK, nohz_flags(this_cpu));
}

#else /* !CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 7199e6f..7a92d60 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -261,6 +261,12 @@ exit_idle:
static void do_idle(void)
{
int cpu = smp_processor_id();
+
+ /*
+ * Check if we need to update blocked load
+ */
+ nohz_run_idle_balance(cpu);
+
/*
* If the arch has a polling bit, we maintain an invariant:
*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10a1522..0ddc9a6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2385,9 +2385,11 @@ extern void cfs_bandwidth_usage_dec(void);
#ifdef CONFIG_NO_HZ_COMMON
#define NOHZ_BALANCE_KICK_BIT 0
#define NOHZ_STATS_KICK_BIT 1
+#define NOHZ_NEWILB_KICK_BIT 2

#define NOHZ_BALANCE_KICK BIT(NOHZ_BALANCE_KICK_BIT)
#define NOHZ_STATS_KICK BIT(NOHZ_STATS_KICK_BIT)
+#define NOHZ_NEWILB_KICK BIT(NOHZ_NEWILB_KICK_BIT)

#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)

@@ -2398,6 +2400,11 @@ extern void nohz_balance_exit_idle(struct rq *rq);
static inline void nohz_balance_exit_idle(struct rq *rq) { }
#endif

+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern void nohz_run_idle_balance(int cpu);
+#else
+static inline void nohz_run_idle_balance(int cpu) { }
+#endif

#ifdef CONFIG_SMP
static inline

2021-03-05 13:56:19

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 6/7 v4] sched/fair: trigger the update of blocked load on newly idle cpu

On 02/24/21 14:30, Vincent Guittot wrote:
> +/*
> + * Check if we need to run the ILB for updating blocked load before entering
> + * idle state.
> + */
> +void nohz_run_idle_balance(int cpu)
> +{
> + unsigned int flags;
> +
> + flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
> +
> + /*
> + * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
> + * (ie NOHZ_STATS_KICK set) and will do the same.
> + */
> + if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
> + _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE);
> +}

nit: need_resched() implies this_cpu, but the function signature implies you
could operate on any CPU. Do need_resched() outside this function or make
the function read smp_processor_id() itself without taking an arg?

Thanks

--
Qais Yousef

2021-03-05 14:03:31

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 3/7 v4] sched/fair: remove unused parameter of update_nohz_stats

On 02/24/21 14:30, Vincent Guittot wrote:
> idle load balance is the only user of update_nohz_stats and doesn't use
> force parameter. Remove it
>
> Signed-off-by: Vincent Guittot <[email protected]>
> ---
> kernel/sched/fair.c | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index e23709f6854b..f52f4dd3fb9e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8352,7 +8352,7 @@ group_type group_classify(unsigned int imbalance_pct,
> return group_has_spare;
> }
>
> -static bool update_nohz_stats(struct rq *rq, bool force)
> +static bool update_nohz_stats(struct rq *rq)
> {
> #ifdef CONFIG_NO_HZ_COMMON
> unsigned int cpu = rq->cpu;
> @@ -8363,7 +8363,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
> if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
> return false;
>
> - if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
> + if (!time_after(jiffies, rq->last_blocked_load_update_tick))
> return true;
>
> update_blocked_averages(cpu);
> @@ -10401,7 +10401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
>
> rq = cpu_rq(balance_cpu);
>
> - has_blocked_load |= update_nohz_stats(rq, true);
> + has_blocked_load |= update_nohz_stats(rq);

I think Dietmar commented on this on v1. There's a change in behavior here
AFAICT. Worth expanding the changelog to explain that this will be rate limited
and why it's okay? It'll help a lost soul like me who doesn't have the ins and
outs of this code carved in their head :-)

Thanks

--
Qais Yousef


>
> /*
> * If time for next balance is due,
> --
> 2.17.1
>

2021-03-06 11:43:49

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Reduce the window for duplicated update

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 39b6a429c30482c349f1bb3746470fe473cbdb0f
Gitweb: https://git.kernel.org/tip/39b6a429c30482c349f1bb3746470fe473cbdb0f
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:07 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:22 +01:00

sched/fair: Reduce the window for duplicated update

Start to update last_blocked_load_update_tick to reduce the possibility
of another cpu starting the update one more time

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index e87e1b3..f1b55f9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7852,16 +7852,20 @@ static inline bool others_have_blocked(struct rq *rq)
return false;
}

-static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+static inline void update_blocked_load_tick(struct rq *rq)
{
- rq->last_blocked_load_update_tick = jiffies;
+ WRITE_ONCE(rq->last_blocked_load_update_tick, jiffies);
+}

+static inline void update_blocked_load_status(struct rq *rq, bool has_blocked)
+{
if (!has_blocked)
rq->has_blocked_load = 0;
}
#else
static inline bool cfs_rq_has_blocked(struct cfs_rq *cfs_rq) { return false; }
static inline bool others_have_blocked(struct rq *rq) { return false; }
+static inline void update_blocked_load_tick(struct rq *rq) {}
static inline void update_blocked_load_status(struct rq *rq, bool has_blocked) {}
#endif

@@ -8022,6 +8026,7 @@ static void update_blocked_averages(int cpu)
struct rq_flags rf;

rq_lock_irqsave(rq, &rf);
+ update_blocked_load_tick(rq);
update_rq_clock(rq);

decayed |= __update_blocked_others(rq, &done);
@@ -8363,7 +8368,7 @@ static bool update_nohz_stats(struct rq *rq)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, READ_ONCE(rq->last_blocked_load_update_tick)))
return true;

update_blocked_averages(cpu);

2021-03-06 11:45:10

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Merge for each idle cpu loop of ILB

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 7a82e5f52a3506bc35a4dc04d53ad2c9daf82e7f
Gitweb: https://git.kernel.org/tip/7a82e5f52a3506bc35a4dc04d53ad2c9daf82e7f
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:04 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched/fair: Merge for each idle cpu loop of ILB

Remove the specific case for handling this_cpu outside for_each_cpu() loop
when running ILB. Instead we use for_each_cpu_wrap() and start with the
next cpu after this_cpu so we will continue to finish with this_cpu.

update_nohz_stats() is now used for this_cpu too and will prevents
unnecessary update. We don't need a special case for handling the update of
nohz.next_balance for this_cpu anymore because it is now handled by the
loop like others.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 32 +++++++-------------------------
1 file changed, 7 insertions(+), 25 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1b91030..3c00918 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10043,22 +10043,9 @@ out:
* When the cpu is attached to null domain for ex, it will not be
* updated.
*/
- if (likely(update_next_balance)) {
+ if (likely(update_next_balance))
rq->next_balance = next_balance;

-#ifdef CONFIG_NO_HZ_COMMON
- /*
- * If this CPU has been elected to perform the nohz idle
- * balance. Other idle CPUs have already rebalanced with
- * nohz_idle_balance() and nohz.next_balance has been
- * updated accordingly. This CPU is now running the idle load
- * balance for itself and we need to update the
- * nohz.next_balance accordingly.
- */
- if ((idle == CPU_IDLE) && time_after(nohz.next_balance, rq->next_balance))
- nohz.next_balance = rq->next_balance;
-#endif
- }
}

static inline int on_null_domain(struct rq *rq)
@@ -10385,8 +10372,12 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
*/
smp_mb();

- for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
- if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
+ /*
+ * Start with the next CPU after this_cpu so we will end with this_cpu and let a
+ * chance for other idle cpu to pull load.
+ */
+ for_each_cpu_wrap(balance_cpu, nohz.idle_cpus_mask, this_cpu+1) {
+ if (!idle_cpu(balance_cpu))
continue;

/*
@@ -10432,15 +10423,6 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
if (likely(update_next_balance))
nohz.next_balance = next_balance;

- /* Newly idle CPU doesn't need an update */
- if (idle != CPU_NEWLY_IDLE) {
- update_blocked_averages(this_cpu);
- has_blocked_load |= this_rq->has_blocked_load;
- }
-
- if (flags & NOHZ_BALANCE_KICK)
- rebalance_domains(this_rq, CPU_IDLE);
-
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

2021-03-06 11:45:31

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove unused return of _nohz_idle_balance

The following commit has been merged into the sched/core branch of tip:

Commit-ID: ab2dde5e98db23387147fb4e7a52b6cf8141cdb3
Gitweb: https://git.kernel.org/tip/ab2dde5e98db23387147fb4e7a52b6cf8141cdb3
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:02 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched/fair: Remove unused return of _nohz_idle_balance

The return of _nohz_idle_balance() is not used anymore so we can remove
it

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 10 +---------
1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 806e16f..6a458e9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10354,10 +10354,8 @@ out:
* Internal function that runs load balance for all idle cpus. The load balance
* can be a simple update of blocked load or a complete load balance with
* tasks movement depending of flags.
- * The function returns false if the loop has stopped before running
- * through all idle CPUs.
*/
-static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
+static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
enum cpu_idle_type idle)
{
/* Earliest time when we have to do rebalance again */
@@ -10367,7 +10365,6 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
int update_next_balance = 0;
int this_cpu = this_rq->cpu;
int balance_cpu;
- int ret = false;
struct rq *rq;

SCHED_WARN_ON((flags & NOHZ_KICK_MASK) == NOHZ_BALANCE_KICK);
@@ -10447,15 +10444,10 @@ static bool _nohz_idle_balance(struct rq *this_rq, unsigned int flags,
WRITE_ONCE(nohz.next_blocked,
now + msecs_to_jiffies(LOAD_AVG_PERIOD));

- /* The full idle balance loop has been done */
- ret = true;
-
abort:
/* There is still blocked load, enable periodic update */
if (has_blocked_load)
WRITE_ONCE(nohz.has_blocked, 1);
-
- return ret;
}

/*

2021-03-06 11:46:53

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Trigger the update of blocked load on newly idle cpu

The following commit has been merged into the sched/core branch of tip:

Commit-ID: c6f886546cb8a38617cdbe755fe50d3acd2463e4
Gitweb: https://git.kernel.org/tip/c6f886546cb8a38617cdbe755fe50d3acd2463e4
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:06 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:22 +01:00

sched/fair: Trigger the update of blocked load on newly idle cpu

Instead of waking up a random and already idle CPU, we can take advantage
of this_cpu being about to enter idle to run the ILB and update the
blocked load.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 2 +-
kernel/sched/fair.c | 24 +++++++++++++++++++++---
kernel/sched/idle.c | 6 ++++++
kernel/sched/sched.h | 7 +++++++
4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f9dfb34..361974e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -737,7 +737,7 @@ static void nohz_csd_func(void *info)
/*
* Release the rq::nohz_csd.
*/
- flags = atomic_fetch_andnot(NOHZ_KICK_MASK, nohz_flags(cpu));
+ flags = atomic_fetch_andnot(NOHZ_KICK_MASK | NOHZ_NEWILB_KICK, nohz_flags(cpu));
WARN_ON(!(flags & NOHZ_KICK_MASK));

rq->idle_balance = idle_cpu(cpu);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 356a245..e87e1b3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10453,6 +10453,24 @@ static bool nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)
return true;
}

+/*
+ * Check if we need to run the ILB for updating blocked load before entering
+ * idle state.
+ */
+void nohz_run_idle_balance(int cpu)
+{
+ unsigned int flags;
+
+ flags = atomic_fetch_andnot(NOHZ_NEWILB_KICK, nohz_flags(cpu));
+
+ /*
+ * Update the blocked load only if no SCHED_SOFTIRQ is about to happen
+ * (ie NOHZ_STATS_KICK set) and will do the same.
+ */
+ if ((flags == NOHZ_NEWILB_KICK) && !need_resched())
+ _nohz_idle_balance(cpu_rq(cpu), NOHZ_STATS_KICK, CPU_IDLE);
+}
+
static void nohz_newidle_balance(struct rq *this_rq)
{
int this_cpu = this_rq->cpu;
@@ -10474,10 +10492,10 @@ static void nohz_newidle_balance(struct rq *this_rq)
return;

/*
- * Blocked load of idle CPUs need to be updated.
- * Kick an ILB to update statistics.
+ * Set the need to trigger ILB in order to update blocked load
+ * before entering idle state.
*/
- kick_ilb(NOHZ_STATS_KICK);
+ atomic_or(NOHZ_NEWILB_KICK, nohz_flags(this_cpu));
}

#else /* !CONFIG_NO_HZ_COMMON */
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 7199e6f..7a92d60 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -261,6 +261,12 @@ exit_idle:
static void do_idle(void)
{
int cpu = smp_processor_id();
+
+ /*
+ * Check if we need to update blocked load
+ */
+ nohz_run_idle_balance(cpu);
+
/*
* If the arch has a polling bit, we maintain an invariant:
*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 10a1522..0ddc9a6 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2385,9 +2385,11 @@ extern void cfs_bandwidth_usage_dec(void);
#ifdef CONFIG_NO_HZ_COMMON
#define NOHZ_BALANCE_KICK_BIT 0
#define NOHZ_STATS_KICK_BIT 1
+#define NOHZ_NEWILB_KICK_BIT 2

#define NOHZ_BALANCE_KICK BIT(NOHZ_BALANCE_KICK_BIT)
#define NOHZ_STATS_KICK BIT(NOHZ_STATS_KICK_BIT)
+#define NOHZ_NEWILB_KICK BIT(NOHZ_NEWILB_KICK_BIT)

#define NOHZ_KICK_MASK (NOHZ_BALANCE_KICK | NOHZ_STATS_KICK)

@@ -2398,6 +2400,11 @@ extern void nohz_balance_exit_idle(struct rq *rq);
static inline void nohz_balance_exit_idle(struct rq *rq) { }
#endif

+#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON)
+extern void nohz_run_idle_balance(int cpu);
+#else
+static inline void nohz_run_idle_balance(int cpu) { }
+#endif

#ifdef CONFIG_SMP
static inline

2021-03-06 11:47:15

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Reorder newidle_balance pulled_task tests

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 6553fc18179113a11835d5fde1735259f8943a55
Gitweb: https://git.kernel.org/tip/6553fc18179113a11835d5fde1735259f8943a55
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:05 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched/fair: Reorder newidle_balance pulled_task tests

Reorder the tests and skip useless ones when no load balance has been
performed and rq lock has not been released.

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3c00918..356a245 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -10584,7 +10584,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
if (curr_cost > this_rq->max_idle_balance_cost)
this_rq->max_idle_balance_cost = curr_cost;

-out:
/*
* While browsing the domains, we released the rq lock, a task could
* have been enqueued in the meantime. Since we're not going idle,
@@ -10593,14 +10592,15 @@ out:
if (this_rq->cfs.h_nr_running && !pulled_task)
pulled_task = 1;

- /* Move the next balance forward */
- if (time_after(this_rq->next_balance, next_balance))
- this_rq->next_balance = next_balance;
-
/* Is there a task of a high priority class? */
if (this_rq->nr_running != this_rq->cfs.h_nr_running)
pulled_task = -1;

+out:
+ /* Move the next balance forward */
+ if (time_after(this_rq->next_balance, next_balance))
+ this_rq->next_balance = next_balance;
+
if (pulled_task)
this_rq->idle_stamp = 0;
else

2021-03-06 11:47:34

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove update of blocked load from newidle_balance

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 0826530de3cbdc89e60a89e86def94a5f0fc81ca
Gitweb: https://git.kernel.org/tip/0826530de3cbdc89e60a89e86def94a5f0fc81ca
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:01 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched/fair: Remove update of blocked load from newidle_balance

newidle_balance runs with both preempt and irq disabled which prevent
local irq to run during this period. The duration for updating the
blocked load of CPUs varies according to the number of CPU cgroups
with non-decayed load and extends this critical period to an uncontrolled
level.

Remove the update from newidle_balance and trigger a normal ILB that
will take care of the update instead.

This reduces the IRQ latency from O(nr_cgroups * nr_nohz_cpus) to
O(nr_cgroups).

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 33 +++++----------------------------
1 file changed, 5 insertions(+), 28 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb..806e16f 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7392,8 +7392,6 @@ enum migration_type {
#define LBF_NEED_BREAK 0x02
#define LBF_DST_PINNED 0x04
#define LBF_SOME_PINNED 0x08
-#define LBF_NOHZ_STATS 0x10
-#define LBF_NOHZ_AGAIN 0x20

struct lb_env {
struct sched_domain *sd;
@@ -8397,9 +8395,6 @@ static inline void update_sg_lb_stats(struct lb_env *env,
for_each_cpu_and(i, sched_group_span(group), env->cpus) {
struct rq *rq = cpu_rq(i);

- if ((env->flags & LBF_NOHZ_STATS) && update_nohz_stats(rq, false))
- env->flags |= LBF_NOHZ_AGAIN;
-
sgs->group_load += cpu_load(rq);
sgs->group_util += cpu_util(i);
sgs->group_runnable += cpu_runnable(rq);
@@ -8940,11 +8935,6 @@ static inline void update_sd_lb_stats(struct lb_env *env, struct sd_lb_stats *sd
struct sg_lb_stats tmp_sgs;
int sg_status = 0;

-#ifdef CONFIG_NO_HZ_COMMON
- if (env->idle == CPU_NEWLY_IDLE && READ_ONCE(nohz.has_blocked))
- env->flags |= LBF_NOHZ_STATS;
-#endif
-
do {
struct sg_lb_stats *sgs = &tmp_sgs;
int local_group;
@@ -8981,14 +8971,6 @@ next_group:
/* Tag domain that child domain prefers tasks go to siblings first */
sds->prefer_sibling = child && child->flags & SD_PREFER_SIBLING;

-#ifdef CONFIG_NO_HZ_COMMON
- if ((env->flags & LBF_NOHZ_AGAIN) &&
- cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd))) {
-
- WRITE_ONCE(nohz.next_blocked,
- jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD));
- }
-#endif

if (env->sd->flags & SD_NUMA)
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
@@ -10517,16 +10499,11 @@ static void nohz_newidle_balance(struct rq *this_rq)
time_before(jiffies, READ_ONCE(nohz.next_blocked)))
return;

- raw_spin_unlock(&this_rq->lock);
/*
- * This CPU is going to be idle and blocked load of idle CPUs
- * need to be updated. Run the ilb locally as it is a good
- * candidate for ilb instead of waking up another idle CPU.
- * Kick an normal ilb if we failed to do the update.
+ * Blocked load of idle CPUs need to be updated.
+ * Kick an ILB to update statistics.
*/
- if (!_nohz_idle_balance(this_rq, NOHZ_STATS_KICK, CPU_NEWLY_IDLE))
- kick_ilb(NOHZ_STATS_KICK);
- raw_spin_lock(&this_rq->lock);
+ kick_ilb(NOHZ_STATS_KICK);
}

#else /* !CONFIG_NO_HZ_COMMON */
@@ -10587,8 +10564,6 @@ static int newidle_balance(struct rq *this_rq, struct rq_flags *rf)
update_next_balance(sd, &next_balance);
rcu_read_unlock();

- nohz_newidle_balance(this_rq);
-
goto out;
}

@@ -10654,6 +10629,8 @@ out:

if (pulled_task)
this_rq->idle_stamp = 0;
+ else
+ nohz_newidle_balance(this_rq);

rq_repin_lock(this_rq, rf);

2021-03-06 11:47:34

by tip-bot2 for Haifeng Xu

[permalink] [raw]
Subject: [tip: sched/core] sched/fair: Remove unused parameter of update_nohz_stats

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 64f84f273592d17dcdca20244168ad9f525a39c3
Gitweb: https://git.kernel.org/tip/64f84f273592d17dcdca20244168ad9f525a39c3
Author: Vincent Guittot <[email protected]>
AuthorDate: Wed, 24 Feb 2021 14:30:03 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched/fair: Remove unused parameter of update_nohz_stats

idle load balance is the only user of update_nohz_stats and doesn't use
force parameter. Remove it

Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reviewed-by: Valentin Schneider <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6a458e9..1b91030 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8352,7 +8352,7 @@ group_type group_classify(unsigned int imbalance_pct,
return group_has_spare;
}

-static bool update_nohz_stats(struct rq *rq, bool force)
+static bool update_nohz_stats(struct rq *rq)
{
#ifdef CONFIG_NO_HZ_COMMON
unsigned int cpu = rq->cpu;
@@ -8363,7 +8363,7 @@ static bool update_nohz_stats(struct rq *rq, bool force)
if (!cpumask_test_cpu(cpu, nohz.idle_cpus_mask))
return false;

- if (!force && !time_after(jiffies, rq->last_blocked_load_update_tick))
+ if (!time_after(jiffies, rq->last_blocked_load_update_tick))
return true;

update_blocked_averages(cpu);
@@ -10401,7 +10401,7 @@ static void _nohz_idle_balance(struct rq *this_rq, unsigned int flags,

rq = cpu_rq(balance_cpu);

- has_blocked_load |= update_nohz_stats(rq, true);
+ has_blocked_load |= update_nohz_stats(rq);

/*
* If time for next balance is due,