2013-04-11 23:51:13

by Frederic Weisbecker

[permalink] [raw]
Subject: [RFC GIT PULL] sched; Fix a few missing rq clock updates

Hi,

So I revisited the rq clock series I had for dynticks. The patches
actually were about upstream issues so I refactored the fixes
under that angle and gave up with the wrong asumption that rq
clock relies on the tick for its updates.

Patches 1-4 fix some missing updates. Additionally I removed
2 of these updates from the previous set:

* No need to update the rq clock on idle_balance() because it should
follow a call to deactivate_task() (unless TIF_NEED_RESCHED is set
on idle without new task on the runqueue, not sure we want to cover that).

* No need to update for try_to_wake_up_local() -> ttwu_do_wakeup() -> check_preempt_curr()
as it's following deactivate_task().

Patch 5 brings accessors that will be necessary to settle an rq clock
debugging engine. What remains is to tag scheduler's entry/exit points
and report missing or redundant update_rq_clock() before calls to
rq_clock() and rq_clock_task().

The branch is pullable from:

git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks.git
sched/core

HEAD: 8b5fa0f1de7e41176dc68740de7548110a404538

Thanks.

---
Frederic Weisbecker (5):
sched: Update rq clock before migrating tasks out of dying CPU
sched: Update rq clock before setting fair group shares
sched: Update rq clock before calling check_preempt_curr()
sched: Update rq clock earlier in unthrottle_cfs_rq
sched: Use an accessor to read rq clock

kernel/sched/core.c | 15 ++++++++++--
kernel/sched/fair.c | 51 +++++++++++++++++++++++++--------------------
kernel/sched/rt.c | 8 +++---
kernel/sched/sched.h | 10 +++++++++
kernel/sched/stats.h | 8 +++---
kernel/sched/stop_task.c | 8 +++---
6 files changed, 62 insertions(+), 38 deletions(-)

--
1.7.5.4


2013-04-11 23:51:19

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 1/5] sched: Update rq clock before migrating tasks out of dying CPU

Because the sched_class::put_prev_task() callback of rt and fair
classes are referring to the rq clock to update their runtime
statistics. There is a missing rq clock update from the CPU
hotplug notifier's entrypoint of the scheduler.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
---
kernel/sched/core.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ee8c1bd..ec036d8 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4928,6 +4928,13 @@ static void migrate_tasks(unsigned int dead_cpu)
*/
rq->stop = NULL;

+ /*
+ * put_prev_task() and pick_next_task() sched
+ * class method both need to have an up-to-date
+ * value of rq->clock[_task]
+ */
+ update_rq_clock(rq);
+
for ( ; ; ) {
/*
* There's this thread running, bail when that's the only
--
1.7.5.4

2013-04-11 23:51:29

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 3/5] sched: Update rq clock before calling check_preempt_curr()

check_preempt_curr() of fair class needs an uptodate sched clock
value to update runtime stats of the current task of the target's rq.

When a task is woken up, activate_task() is usually called right before
ttwu_do_wakeup() unless the task is still in the runqueue. In the latter
case we need to update the rq clock explicitly because activate_task()
isn't here to do the job for us.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
---
kernel/sched/core.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec036d8..5f3e05d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1334,6 +1334,8 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)

rq = __task_rq_lock(p);
if (p->on_rq) {
+ /* check_preempt_curr() may use rq clock */
+ update_rq_clock(rq);
ttwu_do_wakeup(rq, p, wake_flags);
ret = 1;
}
--
1.7.5.4

2013-04-11 23:51:32

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 4/5] sched: Update rq clock earlier in unthrottle_cfs_rq

In this function we are making use of rq->clock right before the
update of the rq clock, let's just call update_rq_clock() just
before that to avoid using a stale rq clock value.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
---
kernel/sched/fair.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c23dde4..fc7d01a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2280,12 +2280,14 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];

cfs_rq->throttled = 0;
+
+ update_rq_clock(rq);
+
raw_spin_lock(&cfs_b->lock);
cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
list_del_rcu(&cfs_rq->throttled_list);
raw_spin_unlock(&cfs_b->lock);

- update_rq_clock(rq);
/* update hierarchical throttle state */
walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);

--
1.7.5.4

2013-04-11 23:51:38

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 5/5] sched: Use an accessor to read rq clock

Read the runqueue clock through an accessor. This
prepares for adding a debugging infrastructure to
detect missing or redundant calls to update_rq_clock()
between a scheduler's entry and exit point.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
---
kernel/sched/core.c | 6 +++---
kernel/sched/fair.c | 44 ++++++++++++++++++++++----------------------
kernel/sched/rt.c | 8 ++++----
kernel/sched/sched.h | 10 ++++++++++
kernel/sched/stats.h | 8 ++++----
kernel/sched/stop_task.c | 8 ++++----
6 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 5f3e05d..f836065 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -636,7 +636,7 @@ void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();

- while ((s64)(rq->clock - rq->age_stamp) > period) {
+ while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -1297,7 +1297,7 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
p->sched_class->task_woken(rq, p);

if (rq->idle_stamp) {
- u64 delta = rq->clock - rq->idle_stamp;
+ u64 delta = rq_clock(rq) - rq->idle_stamp;
u64 max = 2*sysctl_sched_migration_cost;

if (delta > max)
@@ -2638,7 +2638,7 @@ static u64 do_task_delta_exec(struct task_struct *p, struct rq *rq)

if (task_current(rq, p)) {
update_rq_clock(rq);
- ns = rq->clock_task - p->se.exec_start;
+ ns = rq_clock_task(rq) - p->se.exec_start;
if ((s64)ns < 0)
ns = 0;
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index fc7d01a..603b7b1 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -686,7 +686,7 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr,
static void update_curr(struct cfs_rq *cfs_rq)
{
struct sched_entity *curr = cfs_rq->curr;
- u64 now = rq_of(cfs_rq)->clock_task;
+ u64 now = rq_clock_task(rq_of(cfs_rq));
unsigned long delta_exec;

if (unlikely(!curr))
@@ -718,7 +718,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
static inline void
update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- schedstat_set(se->statistics.wait_start, rq_of(cfs_rq)->clock);
+ schedstat_set(se->statistics.wait_start, rq_clock(rq_of(cfs_rq)));
}

/*
@@ -738,14 +738,14 @@ static void
update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
schedstat_set(se->statistics.wait_max, max(se->statistics.wait_max,
- rq_of(cfs_rq)->clock - se->statistics.wait_start));
+ rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start));
schedstat_set(se->statistics.wait_count, se->statistics.wait_count + 1);
schedstat_set(se->statistics.wait_sum, se->statistics.wait_sum +
- rq_of(cfs_rq)->clock - se->statistics.wait_start);
+ rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start);
#ifdef CONFIG_SCHEDSTATS
if (entity_is_task(se)) {
trace_sched_stat_wait(task_of(se),
- rq_of(cfs_rq)->clock - se->statistics.wait_start);
+ rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start);
}
#endif
schedstat_set(se->statistics.wait_start, 0);
@@ -771,7 +771,7 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
/*
* We are starting a new run period:
*/
- se->exec_start = rq_of(cfs_rq)->clock_task;
+ se->exec_start = rq_clock_task(rq_of(cfs_rq));
}

/**************************************************
@@ -1497,7 +1497,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update)

static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
{
- __update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable);
+ __update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable);
__update_tg_runnable_avg(&rq->avg, &rq->cfs);
}

@@ -1512,7 +1512,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
* accumulated while sleeping.
*/
if (unlikely(se->avg.decay_count <= 0)) {
- se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
+ se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
if (se->avg.decay_count) {
/*
* In a wake-up migration we have to approximate the
@@ -1586,7 +1586,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
tsk = task_of(se);

if (se->statistics.sleep_start) {
- u64 delta = rq_of(cfs_rq)->clock - se->statistics.sleep_start;
+ u64 delta = rq_clock(rq_of(cfs_rq)) - se->statistics.sleep_start;

if ((s64)delta < 0)
delta = 0;
@@ -1603,7 +1603,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
}
}
if (se->statistics.block_start) {
- u64 delta = rq_of(cfs_rq)->clock - se->statistics.block_start;
+ u64 delta = rq_clock(rq_of(cfs_rq)) - se->statistics.block_start;

if ((s64)delta < 0)
delta = 0;
@@ -1784,9 +1784,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
struct task_struct *tsk = task_of(se);

if (tsk->state & TASK_INTERRUPTIBLE)
- se->statistics.sleep_start = rq_of(cfs_rq)->clock;
+ se->statistics.sleep_start = rq_clock(rq_of(cfs_rq));
if (tsk->state & TASK_UNINTERRUPTIBLE)
- se->statistics.block_start = rq_of(cfs_rq)->clock;
+ se->statistics.block_start = rq_clock(rq_of(cfs_rq));
}
#endif
}
@@ -2061,7 +2061,7 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
if (unlikely(cfs_rq->throttle_count))
return cfs_rq->throttled_clock_task;

- return rq_of(cfs_rq)->clock_task - cfs_rq->throttled_clock_task_time;
+ return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
}

/* returns 0 on failure to allocate runtime */
@@ -2120,7 +2120,7 @@ static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);

/* if the deadline is ahead of our clock, nothing to do */
- if (likely((s64)(rq->clock - cfs_rq->runtime_expires) < 0))
+ if (likely((s64)(rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires) < 0))
return;

if (cfs_rq->runtime_remaining < 0)
@@ -2209,7 +2209,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
#ifdef CONFIG_SMP
if (!cfs_rq->throttle_count) {
/* adjust cfs_rq_clock_task() */
- cfs_rq->throttled_clock_task_time += rq->clock_task -
+ cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
cfs_rq->throttled_clock_task;
}
#endif
@@ -2224,7 +2224,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)

/* group is entering throttled state, stop time */
if (!cfs_rq->throttle_count)
- cfs_rq->throttled_clock_task = rq->clock_task;
+ cfs_rq->throttled_clock_task = rq_clock_task(rq);
cfs_rq->throttle_count++;

return 0;
@@ -2263,7 +2263,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
rq->nr_running -= task_delta;

cfs_rq->throttled = 1;
- cfs_rq->throttled_clock = rq->clock;
+ cfs_rq->throttled_clock = rq_clock(rq);
raw_spin_lock(&cfs_b->lock);
list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq);
raw_spin_unlock(&cfs_b->lock);
@@ -2284,7 +2284,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
update_rq_clock(rq);

raw_spin_lock(&cfs_b->lock);
- cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
+ cfs_b->throttled_time += rq_clock(rq) - cfs_rq->throttled_clock;
list_del_rcu(&cfs_rq->throttled_list);
raw_spin_unlock(&cfs_b->lock);

@@ -2687,7 +2687,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
#else /* CONFIG_CFS_BANDWIDTH */
static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
{
- return rq_of(cfs_rq)->clock_task;
+ return rq_clock_task(rq_of(cfs_rq));
}

static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
@@ -3920,7 +3920,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
* 2) too many balance attempts have failed.
*/

- tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
+ tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
if (!tsk_cache_hot ||
env->sd->nr_balance_failed > env->sd->cache_nice_tries) {

@@ -4282,7 +4282,7 @@ static unsigned long scale_rt_power(int cpu)
age_stamp = ACCESS_ONCE(rq->age_stamp);
avg = ACCESS_ONCE(rq->rt_avg);

- total = sched_avg_period() + (rq->clock - age_stamp);
+ total = sched_avg_period() + (rq_clock(rq) - age_stamp);

if (unlikely(total < avg)) {
/* Ensures that power won't end up being negative */
@@ -5214,7 +5214,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
int pulled_task = 0;
unsigned long next_balance = jiffies + HZ;

- this_rq->idle_stamp = this_rq->clock;
+ this_rq->idle_stamp = rq_clock(this_rq);

if (this_rq->avg_idle < sysctl_sched_migration_cost)
return;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 127a2c4..3dfffc4 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -926,7 +926,7 @@ static void update_curr_rt(struct rq *rq)
if (curr->sched_class != &rt_sched_class)
return;

- delta_exec = rq->clock_task - curr->se.exec_start;
+ delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec <= 0))
return;

@@ -936,7 +936,7 @@ static void update_curr_rt(struct rq *rq)
curr->se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);

- curr->se.exec_start = rq->clock_task;
+ curr->se.exec_start = rq_clock_task(rq);
cpuacct_charge(curr, delta_exec);

sched_rt_avg_update(rq, delta_exec);
@@ -1385,7 +1385,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
} while (rt_rq);

p = rt_task_of(rt_se);
- p->se.exec_start = rq->clock_task;
+ p->se.exec_start = rq_clock_task(rq);

return p;
}
@@ -2037,7 +2037,7 @@ static void set_curr_task_rt(struct rq *rq)
{
struct task_struct *p = rq->curr;

- p->se.exec_start = rq->clock_task;
+ p->se.exec_start = rq_clock_task(rq);

/* The running task is never eligible for pushing */
dequeue_pushable_task(rq, p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8116cf8..425fa38 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -536,6 +536,16 @@ DECLARE_PER_CPU(struct rq, runqueues);
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
#define raw_rq() (&__raw_get_cpu_var(runqueues))

+static inline u64 rq_clock(struct rq *rq)
+{
+ return rq->clock;
+}
+
+static inline u64 rq_clock_task(struct rq *rq)
+{
+ return rq->clock_task;
+}
+
#ifdef CONFIG_SMP

#define rcu_dereference_check_sched_domain(p) \
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 2ef90a5..17d7065 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -61,7 +61,7 @@ static inline void sched_info_reset_dequeued(struct task_struct *t)
*/
static inline void sched_info_dequeued(struct task_struct *t)
{
- unsigned long long now = task_rq(t)->clock, delta = 0;
+ unsigned long long now = rq_clock(task_rq(t)), delta = 0;

if (unlikely(sched_info_on()))
if (t->sched_info.last_queued)
@@ -79,7 +79,7 @@ static inline void sched_info_dequeued(struct task_struct *t)
*/
static void sched_info_arrive(struct task_struct *t)
{
- unsigned long long now = task_rq(t)->clock, delta = 0;
+ unsigned long long now = rq_clock(task_rq(t)), delta = 0;

if (t->sched_info.last_queued)
delta = now - t->sched_info.last_queued;
@@ -100,7 +100,7 @@ static inline void sched_info_queued(struct task_struct *t)
{
if (unlikely(sched_info_on()))
if (!t->sched_info.last_queued)
- t->sched_info.last_queued = task_rq(t)->clock;
+ t->sched_info.last_queued = rq_clock(task_rq(t));
}

/*
@@ -112,7 +112,7 @@ static inline void sched_info_queued(struct task_struct *t)
*/
static inline void sched_info_depart(struct task_struct *t)
{
- unsigned long long delta = task_rq(t)->clock -
+ unsigned long long delta = rq_clock(task_rq(t)) -
t->sched_info.last_arrival;

rq_sched_info_depart(task_rq(t), delta);
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index da5eb5b..e08fbee 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -28,7 +28,7 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
struct task_struct *stop = rq->stop;

if (stop && stop->on_rq) {
- stop->se.exec_start = rq->clock_task;
+ stop->se.exec_start = rq_clock_task(rq);
return stop;
}

@@ -57,7 +57,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
struct task_struct *curr = rq->curr;
u64 delta_exec;

- delta_exec = rq->clock_task - curr->se.exec_start;
+ delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec < 0))
delta_exec = 0;

@@ -67,7 +67,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
curr->se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);

- curr->se.exec_start = rq->clock_task;
+ curr->se.exec_start = rq_clock_task(rq);
cpuacct_charge(curr, delta_exec);
}

@@ -79,7 +79,7 @@ static void set_curr_task_stop(struct rq *rq)
{
struct task_struct *stop = rq->stop;

- stop->se.exec_start = rq->clock_task;
+ stop->se.exec_start = rq_clock_task(rq);
}

static void switched_to_stop(struct rq *rq, struct task_struct *p)
--
1.7.5.4

2013-04-11 23:58:08

by Frederic Weisbecker

[permalink] [raw]
Subject: [PATCH 2/5] sched: Update rq clock before setting fair group shares

Because we may update the execution time (sched_group_set_shares()->
update_cfs_shares()->reweight_entity()->update_curr()) before
reweighting the entity while setting the group shares and this requires
an uptodate version of the runqueue clock.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
---
kernel/sched/fair.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 155783b..c23dde4 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6057,6 +6057,9 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
se = tg->se[i];
/* Propagate contribution to hierarchy */
raw_spin_lock_irqsave(&rq->lock, flags);
+
+ /* Possible calls to update_curr() need rq clock */
+ update_rq_clock(rq);
for_each_sched_entity(se)
update_cfs_shares(group_cfs_rq(se));
raw_spin_unlock_irqrestore(&rq->lock, flags);
--
1.7.5.4

2013-05-14 11:59:22

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC GIT PULL] sched; Fix a few missing rq clock updates

On Fri, Apr 12, 2013 at 01:50:57AM +0200, Frederic Weisbecker wrote:
> Hi,
>
> So I revisited the rq clock series I had for dynticks. The patches
> actually were about upstream issues so I refactored the fixes
> under that angle and gave up with the wrong asumption that rq
> clock relies on the tick for its updates.
>
> Patches 1-4 fix some missing updates. Additionally I removed
> 2 of these updates from the previous set:
>
> * No need to update the rq clock on idle_balance() because it should
> follow a call to deactivate_task() (unless TIF_NEED_RESCHED is set
> on idle without new task on the runqueue, not sure we want to cover that).
>
> * No need to update for try_to_wake_up_local() -> ttwu_do_wakeup() -> check_preempt_curr()
> as it's following deactivate_task().
>
> Patch 5 brings accessors that will be necessary to settle an rq clock
> debugging engine. What remains is to tag scheduler's entry/exit points
> and report missing or redundant update_rq_clock() before calls to
> rq_clock() and rq_clock_task().

Just noticed this queue was still sitting in the INBOX, I took it and will
soon-ish hand to Ingo.

Thanks!

2013-05-14 12:46:53

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: [RFC GIT PULL] sched; Fix a few missing rq clock updates

On Tue, May 14, 2013 at 01:57:33PM +0200, Peter Zijlstra wrote:
> On Fri, Apr 12, 2013 at 01:50:57AM +0200, Frederic Weisbecker wrote:
> > Hi,
> >
> > So I revisited the rq clock series I had for dynticks. The patches
> > actually were about upstream issues so I refactored the fixes
> > under that angle and gave up with the wrong asumption that rq
> > clock relies on the tick for its updates.
> >
> > Patches 1-4 fix some missing updates. Additionally I removed
> > 2 of these updates from the previous set:
> >
> > * No need to update the rq clock on idle_balance() because it should
> > follow a call to deactivate_task() (unless TIF_NEED_RESCHED is set
> > on idle without new task on the runqueue, not sure we want to cover that).
> >
> > * No need to update for try_to_wake_up_local() -> ttwu_do_wakeup() -> check_preempt_curr()
> > as it's following deactivate_task().
> >
> > Patch 5 brings accessors that will be necessary to settle an rq clock
> > debugging engine. What remains is to tag scheduler's entry/exit points
> > and report missing or redundant update_rq_clock() before calls to
> > rq_clock() and rq_clock_task().
>
> Just noticed this queue was still sitting in the INBOX, I took it and will
> soon-ish hand to Ingo.

Thanks a lot!

Subject: [tip:sched/core] sched: Update rq clock before migrating tasks out of dying CPU

Commit-ID: 77bd39702f0b3840cea17681409270b16a3b93c0
Gitweb: http://git.kernel.org/tip/77bd39702f0b3840cea17681409270b16a3b93c0
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 12 Apr 2013 01:50:58 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 28 May 2013 09:40:23 +0200

sched: Update rq clock before migrating tasks out of dying CPU

Because the sched_class::put_prev_task() callback of rt and fair
classes are referring to the rq clock to update their runtime
statistics. There is a missing rq clock update from the CPU
hotplug notifier's entry point of the scheduler.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 79e48e6..7bf0418 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4378,6 +4378,13 @@ static void migrate_tasks(unsigned int dead_cpu)
*/
rq->stop = NULL;

+ /*
+ * put_prev_task() and pick_next_task() sched
+ * class method both need to have an up-to-date
+ * value of rq->clock[_task]
+ */
+ update_rq_clock(rq);
+
for ( ; ; ) {
/*
* There's this thread running, bail when that's the only

Subject: [tip:sched/core] sched: Update rq clock before setting fair group shares

Commit-ID: 71b1da46ff70309a2ec12ce943aa0d192d2c8f0c
Gitweb: http://git.kernel.org/tip/71b1da46ff70309a2ec12ce943aa0d192d2c8f0c
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 12 Apr 2013 01:50:59 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 28 May 2013 09:40:23 +0200

sched: Update rq clock before setting fair group shares

Because we may update the execution time in

sched_group_set_shares()->update_cfs_shares()->reweight_entity()->update_curr()

before reweighting the entity while setting the group shares and this requires
an uptodate version of the runqueue clock.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f62b16d..f76ca21 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6107,6 +6107,9 @@ int sched_group_set_shares(struct task_group *tg, unsigned long shares)
se = tg->se[i];
/* Propagate contribution to hierarchy */
raw_spin_lock_irqsave(&rq->lock, flags);
+
+ /* Possible calls to update_curr() need rq clock */
+ update_rq_clock(rq);
for_each_sched_entity(se)
update_cfs_shares(group_cfs_rq(se));
raw_spin_unlock_irqrestore(&rq->lock, flags);

Subject: [tip:sched/core] sched: Update rq clock before calling check_preempt_curr()

Commit-ID: 1ad4ec0dc740c4183acd6d6e367ca52b28e4fa94
Gitweb: http://git.kernel.org/tip/1ad4ec0dc740c4183acd6d6e367ca52b28e4fa94
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 12 Apr 2013 01:51:00 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 28 May 2013 09:40:24 +0200

sched: Update rq clock before calling check_preempt_curr()

check_preempt_curr() of fair class needs an uptodate sched clock
value to update runtime stats of the current task of the target's rq.

When a task is woken up, activate_task() is usually called right before
ttwu_do_wakeup() unless the task is still in the runqueue. In the latter
case we need to update the rq clock explicitly because activate_task()
isn't here to do the job for us.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7bf0418..46d0017 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1365,6 +1365,8 @@ static int ttwu_remote(struct task_struct *p, int wake_flags)

rq = __task_rq_lock(p);
if (p->on_rq) {
+ /* check_preempt_curr() may use rq clock */
+ update_rq_clock(rq);
ttwu_do_wakeup(rq, p, wake_flags);
ret = 1;
}

Subject: [tip:sched/core] sched: Update rq clock earlier in unthrottle_cfs_rq

Commit-ID: 1a55af2e45cce0ff13bc33c8ee99da84e188b615
Gitweb: http://git.kernel.org/tip/1a55af2e45cce0ff13bc33c8ee99da84e188b615
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 12 Apr 2013 01:51:01 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 28 May 2013 09:40:25 +0200

sched: Update rq clock earlier in unthrottle_cfs_rq

In this function we are making use of rq->clock right before the
update of the rq clock, let's just call update_rq_clock() just
before that to avoid using a stale rq clock value.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/fair.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index f76ca21..1c8762a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2319,12 +2319,14 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
se = cfs_rq->tg->se[cpu_of(rq_of(cfs_rq))];

cfs_rq->throttled = 0;
+
+ update_rq_clock(rq);
+
raw_spin_lock(&cfs_b->lock);
cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
list_del_rcu(&cfs_rq->throttled_list);
raw_spin_unlock(&cfs_b->lock);

- update_rq_clock(rq);
/* update hierarchical throttle state */
walk_tg_tree_from(cfs_rq->tg, tg_nop, tg_unthrottle_up, (void *)rq);

Subject: [tip:sched/core] sched: Use an accessor to read the rq clock

Commit-ID: 78becc27097585c6aec7043834cadde950ae79f2
Gitweb: http://git.kernel.org/tip/78becc27097585c6aec7043834cadde950ae79f2
Author: Frederic Weisbecker <[email protected]>
AuthorDate: Fri, 12 Apr 2013 01:51:02 +0200
Committer: Ingo Molnar <[email protected]>
CommitDate: Tue, 28 May 2013 09:40:27 +0200

sched: Use an accessor to read the rq clock

Read the runqueue clock through an accessor. This
prepares for adding a debugging infrastructure to
detect missing or redundant calls to update_rq_clock()
between a scheduler's entry and exit point.

Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Li Zhong <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Turner <[email protected]>
Cc: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
---
kernel/sched/core.c | 6 +++---
kernel/sched/fair.c | 44 ++++++++++++++++++++++----------------------
kernel/sched/rt.c | 8 ++++----
kernel/sched/sched.h | 10 ++++++++++
kernel/sched/stats.h | 8 ++++----
kernel/sched/stop_task.c | 8 ++++----
6 files changed, 47 insertions(+), 37 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 46d0017..36f85be 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -667,7 +667,7 @@ void sched_avg_update(struct rq *rq)
{
s64 period = sched_avg_period();

- while ((s64)(rq->clock - rq->age_stamp) > period) {
+ while ((s64)(rq_clock(rq) - rq->age_stamp) > period) {
/*
* Inline assembly required to prevent the compiler
* optimising this loop into a divmod call.
@@ -1328,7 +1328,7 @@ ttwu_do_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
p->sched_class->task_woken(rq, p);

if (rq->idle_stamp) {
- u64 delta = rq->clock - rq->idle_stamp;
+ u64 delta = rq_clock(rq) - rq->idle_stamp;
u64 max = 2*sysctl_sched_migration_cost;

if (delta > max)
@@ -2106,7 +2106,7 @@ static u64 do_task_delta_exec(struct task_struct *p, struct rq *rq)

if (task_current(rq, p)) {
update_rq_clock(rq);
- ns = rq->clock_task - p->se.exec_start;
+ ns = rq_clock_task(rq) - p->se.exec_start;
if ((s64)ns < 0)
ns = 0;
}
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1c8762a..3ee1c2e 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -704,7 +704,7 @@ __update_curr(struct cfs_rq *cfs_rq, struct sched_entity *curr,
static void update_curr(struct cfs_rq *cfs_rq)
{
struct sched_entity *curr = cfs_rq->curr;
- u64 now = rq_of(cfs_rq)->clock_task;
+ u64 now = rq_clock_task(rq_of(cfs_rq));
unsigned long delta_exec;

if (unlikely(!curr))
@@ -736,7 +736,7 @@ static void update_curr(struct cfs_rq *cfs_rq)
static inline void
update_stats_wait_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- schedstat_set(se->statistics.wait_start, rq_of(cfs_rq)->clock);
+ schedstat_set(se->statistics.wait_start, rq_clock(rq_of(cfs_rq)));
}

/*
@@ -756,14 +756,14 @@ static void
update_stats_wait_end(struct cfs_rq *cfs_rq, struct sched_entity *se)
{
schedstat_set(se->statistics.wait_max, max(se->statistics.wait_max,
- rq_of(cfs_rq)->clock - se->statistics.wait_start));
+ rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start));
schedstat_set(se->statistics.wait_count, se->statistics.wait_count + 1);
schedstat_set(se->statistics.wait_sum, se->statistics.wait_sum +
- rq_of(cfs_rq)->clock - se->statistics.wait_start);
+ rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start);
#ifdef CONFIG_SCHEDSTATS
if (entity_is_task(se)) {
trace_sched_stat_wait(task_of(se),
- rq_of(cfs_rq)->clock - se->statistics.wait_start);
+ rq_clock(rq_of(cfs_rq)) - se->statistics.wait_start);
}
#endif
schedstat_set(se->statistics.wait_start, 0);
@@ -789,7 +789,7 @@ update_stats_curr_start(struct cfs_rq *cfs_rq, struct sched_entity *se)
/*
* We are starting a new run period:
*/
- se->exec_start = rq_of(cfs_rq)->clock_task;
+ se->exec_start = rq_clock_task(rq_of(cfs_rq));
}

/**************************************************
@@ -1515,7 +1515,7 @@ static void update_cfs_rq_blocked_load(struct cfs_rq *cfs_rq, int force_update)

static inline void update_rq_runnable_avg(struct rq *rq, int runnable)
{
- __update_entity_runnable_avg(rq->clock_task, &rq->avg, runnable);
+ __update_entity_runnable_avg(rq_clock_task(rq), &rq->avg, runnable);
__update_tg_runnable_avg(&rq->avg, &rq->cfs);
}

@@ -1530,7 +1530,7 @@ static inline void enqueue_entity_load_avg(struct cfs_rq *cfs_rq,
* accumulated while sleeping.
*/
if (unlikely(se->avg.decay_count <= 0)) {
- se->avg.last_runnable_update = rq_of(cfs_rq)->clock_task;
+ se->avg.last_runnable_update = rq_clock_task(rq_of(cfs_rq));
if (se->avg.decay_count) {
/*
* In a wake-up migration we have to approximate the
@@ -1625,7 +1625,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
tsk = task_of(se);

if (se->statistics.sleep_start) {
- u64 delta = rq_of(cfs_rq)->clock - se->statistics.sleep_start;
+ u64 delta = rq_clock(rq_of(cfs_rq)) - se->statistics.sleep_start;

if ((s64)delta < 0)
delta = 0;
@@ -1642,7 +1642,7 @@ static void enqueue_sleeper(struct cfs_rq *cfs_rq, struct sched_entity *se)
}
}
if (se->statistics.block_start) {
- u64 delta = rq_of(cfs_rq)->clock - se->statistics.block_start;
+ u64 delta = rq_clock(rq_of(cfs_rq)) - se->statistics.block_start;

if ((s64)delta < 0)
delta = 0;
@@ -1823,9 +1823,9 @@ dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
struct task_struct *tsk = task_of(se);

if (tsk->state & TASK_INTERRUPTIBLE)
- se->statistics.sleep_start = rq_of(cfs_rq)->clock;
+ se->statistics.sleep_start = rq_clock(rq_of(cfs_rq));
if (tsk->state & TASK_UNINTERRUPTIBLE)
- se->statistics.block_start = rq_of(cfs_rq)->clock;
+ se->statistics.block_start = rq_clock(rq_of(cfs_rq));
}
#endif
}
@@ -2100,7 +2100,7 @@ static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
if (unlikely(cfs_rq->throttle_count))
return cfs_rq->throttled_clock_task;

- return rq_of(cfs_rq)->clock_task - cfs_rq->throttled_clock_task_time;
+ return rq_clock_task(rq_of(cfs_rq)) - cfs_rq->throttled_clock_task_time;
}

/* returns 0 on failure to allocate runtime */
@@ -2159,7 +2159,7 @@ static void expire_cfs_rq_runtime(struct cfs_rq *cfs_rq)
struct rq *rq = rq_of(cfs_rq);

/* if the deadline is ahead of our clock, nothing to do */
- if (likely((s64)(rq->clock - cfs_rq->runtime_expires) < 0))
+ if (likely((s64)(rq_clock(rq_of(cfs_rq)) - cfs_rq->runtime_expires) < 0))
return;

if (cfs_rq->runtime_remaining < 0)
@@ -2248,7 +2248,7 @@ static int tg_unthrottle_up(struct task_group *tg, void *data)
#ifdef CONFIG_SMP
if (!cfs_rq->throttle_count) {
/* adjust cfs_rq_clock_task() */
- cfs_rq->throttled_clock_task_time += rq->clock_task -
+ cfs_rq->throttled_clock_task_time += rq_clock_task(rq) -
cfs_rq->throttled_clock_task;
}
#endif
@@ -2263,7 +2263,7 @@ static int tg_throttle_down(struct task_group *tg, void *data)

/* group is entering throttled state, stop time */
if (!cfs_rq->throttle_count)
- cfs_rq->throttled_clock_task = rq->clock_task;
+ cfs_rq->throttled_clock_task = rq_clock_task(rq);
cfs_rq->throttle_count++;

return 0;
@@ -2302,7 +2302,7 @@ static void throttle_cfs_rq(struct cfs_rq *cfs_rq)
rq->nr_running -= task_delta;

cfs_rq->throttled = 1;
- cfs_rq->throttled_clock = rq->clock;
+ cfs_rq->throttled_clock = rq_clock(rq);
raw_spin_lock(&cfs_b->lock);
list_add_tail_rcu(&cfs_rq->throttled_list, &cfs_b->throttled_cfs_rq);
raw_spin_unlock(&cfs_b->lock);
@@ -2323,7 +2323,7 @@ void unthrottle_cfs_rq(struct cfs_rq *cfs_rq)
update_rq_clock(rq);

raw_spin_lock(&cfs_b->lock);
- cfs_b->throttled_time += rq->clock - cfs_rq->throttled_clock;
+ cfs_b->throttled_time += rq_clock(rq) - cfs_rq->throttled_clock;
list_del_rcu(&cfs_rq->throttled_list);
raw_spin_unlock(&cfs_b->lock);

@@ -2726,7 +2726,7 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
#else /* CONFIG_CFS_BANDWIDTH */
static inline u64 cfs_rq_clock_task(struct cfs_rq *cfs_rq)
{
- return rq_of(cfs_rq)->clock_task;
+ return rq_clock_task(rq_of(cfs_rq));
}

static void account_cfs_rq_runtime(struct cfs_rq *cfs_rq,
@@ -3966,7 +3966,7 @@ int can_migrate_task(struct task_struct *p, struct lb_env *env)
* 2) too many balance attempts have failed.
*/

- tsk_cache_hot = task_hot(p, env->src_rq->clock_task, env->sd);
+ tsk_cache_hot = task_hot(p, rq_clock_task(env->src_rq), env->sd);
if (!tsk_cache_hot ||
env->sd->nr_balance_failed > env->sd->cache_nice_tries) {

@@ -4322,7 +4322,7 @@ static unsigned long scale_rt_power(int cpu)
age_stamp = ACCESS_ONCE(rq->age_stamp);
avg = ACCESS_ONCE(rq->rt_avg);

- total = sched_avg_period() + (rq->clock - age_stamp);
+ total = sched_avg_period() + (rq_clock(rq) - age_stamp);

if (unlikely(total < avg)) {
/* Ensures that power won't end up being negative */
@@ -5261,7 +5261,7 @@ void idle_balance(int this_cpu, struct rq *this_rq)
int pulled_task = 0;
unsigned long next_balance = jiffies + HZ;

- this_rq->idle_stamp = this_rq->clock;
+ this_rq->idle_stamp = rq_clock(this_rq);

if (this_rq->avg_idle < sysctl_sched_migration_cost)
return;
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 8853ab1..8d85f9a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -886,7 +886,7 @@ static void update_curr_rt(struct rq *rq)
if (curr->sched_class != &rt_sched_class)
return;

- delta_exec = rq->clock_task - curr->se.exec_start;
+ delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec <= 0))
return;

@@ -896,7 +896,7 @@ static void update_curr_rt(struct rq *rq)
curr->se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);

- curr->se.exec_start = rq->clock_task;
+ curr->se.exec_start = rq_clock_task(rq);
cpuacct_charge(curr, delta_exec);

sched_rt_avg_update(rq, delta_exec);
@@ -1345,7 +1345,7 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
} while (rt_rq);

p = rt_task_of(rt_se);
- p->se.exec_start = rq->clock_task;
+ p->se.exec_start = rq_clock_task(rq);

return p;
}
@@ -1997,7 +1997,7 @@ static void set_curr_task_rt(struct rq *rq)
{
struct task_struct *p = rq->curr;

- p->se.exec_start = rq->clock_task;
+ p->se.exec_start = rq_clock_task(rq);

/* The running task is never eligible for pushing */
dequeue_pushable_task(rq, p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c806c61..74ff659 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -548,6 +548,16 @@ DECLARE_PER_CPU(struct rq, runqueues);
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
#define raw_rq() (&__raw_get_cpu_var(runqueues))

+static inline u64 rq_clock(struct rq *rq)
+{
+ return rq->clock;
+}
+
+static inline u64 rq_clock_task(struct rq *rq)
+{
+ return rq->clock_task;
+}
+
#ifdef CONFIG_SMP

#define rcu_dereference_check_sched_domain(p) \
diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h
index 2ef90a5..17d7065 100644
--- a/kernel/sched/stats.h
+++ b/kernel/sched/stats.h
@@ -61,7 +61,7 @@ static inline void sched_info_reset_dequeued(struct task_struct *t)
*/
static inline void sched_info_dequeued(struct task_struct *t)
{
- unsigned long long now = task_rq(t)->clock, delta = 0;
+ unsigned long long now = rq_clock(task_rq(t)), delta = 0;

if (unlikely(sched_info_on()))
if (t->sched_info.last_queued)
@@ -79,7 +79,7 @@ static inline void sched_info_dequeued(struct task_struct *t)
*/
static void sched_info_arrive(struct task_struct *t)
{
- unsigned long long now = task_rq(t)->clock, delta = 0;
+ unsigned long long now = rq_clock(task_rq(t)), delta = 0;

if (t->sched_info.last_queued)
delta = now - t->sched_info.last_queued;
@@ -100,7 +100,7 @@ static inline void sched_info_queued(struct task_struct *t)
{
if (unlikely(sched_info_on()))
if (!t->sched_info.last_queued)
- t->sched_info.last_queued = task_rq(t)->clock;
+ t->sched_info.last_queued = rq_clock(task_rq(t));
}

/*
@@ -112,7 +112,7 @@ static inline void sched_info_queued(struct task_struct *t)
*/
static inline void sched_info_depart(struct task_struct *t)
{
- unsigned long long delta = task_rq(t)->clock -
+ unsigned long long delta = rq_clock(task_rq(t)) -
t->sched_info.last_arrival;

rq_sched_info_depart(task_rq(t), delta);
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index da5eb5b..e08fbee 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -28,7 +28,7 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
struct task_struct *stop = rq->stop;

if (stop && stop->on_rq) {
- stop->se.exec_start = rq->clock_task;
+ stop->se.exec_start = rq_clock_task(rq);
return stop;
}

@@ -57,7 +57,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
struct task_struct *curr = rq->curr;
u64 delta_exec;

- delta_exec = rq->clock_task - curr->se.exec_start;
+ delta_exec = rq_clock_task(rq) - curr->se.exec_start;
if (unlikely((s64)delta_exec < 0))
delta_exec = 0;

@@ -67,7 +67,7 @@ static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
curr->se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);

- curr->se.exec_start = rq->clock_task;
+ curr->se.exec_start = rq_clock_task(rq);
cpuacct_charge(curr, delta_exec);
}

@@ -79,7 +79,7 @@ static void set_curr_task_stop(struct rq *rq)
{
struct task_struct *stop = rq->stop;

- stop->se.exec_start = rq->clock_task;
+ stop->se.exec_start = rq_clock_task(rq);
}

static void switched_to_stop(struct rq *rq, struct task_struct *p)