2012-08-03 13:42:41

by Mike Galbraith

[permalink] [raw]
Subject: [patch] sched: fix migration thread runtime bogosity

Greetings,

I have two bug reports of absurd migration thread CPU usage, one of them
with a link to a bisection..

https://bugs.gentoo.org/show_bug.cgi?id=394487

..fingering d670ec13 - posix-cpu-timers: Cure SMP wobbles

I reproduced with my -rt kernel and 3.4, but didn't manage to reproduce
with the 3.0 NOPREEMPT kernel it was reported against.

Three options below, two tested in my -rt kernel.

A [ ] stop class doesn't do bean counting
B [ ] stop class counts beans like everybody else
C [ ] none of the above (not appended for brevity:)

I prefer B, elite class tasks eat beans.

A: sched: fix migration thread runtime bogosity

stop class threads don't do bean counting. Wipe the evidence of their
lowly birth, and don't try to use exec_start which is never updated.

vogelweide:/:[0]# ps l 824
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 0 824 2 -100 - 0 0 cpu_st S ? 2799:06 [migration/57]

Signed-off-by: Mike Galbraith <[email protected]>
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 82ad284..82a78a6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -974,6 +974,13 @@ void sched_set_stop_task(int cpu, struct task_struct *stop)
sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);

stop->sched_class = &stop_sched_class;
+
+ /* Zero stale values for our non-accountable thread. */
+ stop->se.exec_start = 0;
+ stop->se.sum_exec_runtime = 0;
+ stop->se.prev_sum_exec_runtime = 0;
+ stop->stime = stop->stimescaled = 0;
+ stop->nvcsw = stop->nivcsw = 0;
}

cpu_rq(cpu)->stop = stop;
@@ -2803,7 +2810,8 @@ unsigned long long task_sched_runtime(struct task_struct *p)
u64 ns = 0;

rq = task_rq_lock(p, &flags);
- ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
+ if (likely(p != rq->stop)
+ ns = p->se.sum_exec_runtime + do_task_delta_exec(p, rq);
task_rq_unlock(rq, p, &flags);

return ns;


B: sched: fix migration thread runtime bogosity

stop class threads need to do bean counting lest the below happen.

vogelweide:/:[0]# ps l 824
F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND
1 0 824 2 -100 - 0 0 cpu_st S ? 2799:06 [migration/57]

Signed-off-by: Mike Galbraith <[email protected]>

diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 7b386e8..da5eb5b 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -27,8 +27,10 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
{
struct task_struct *stop = rq->stop;

- if (stop && stop->on_rq)
+ if (stop && stop->on_rq) {
+ stop->se.exec_start = rq->clock_task;
return stop;
+ }

return NULL;
}
@@ -52,6 +54,21 @@ static void yield_task_stop(struct rq *rq)

static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
{
+ struct task_struct *curr = rq->curr;
+ u64 delta_exec;
+
+ delta_exec = rq->clock_task - curr->se.exec_start;
+ if (unlikely((s64)delta_exec < 0))
+ delta_exec = 0;
+
+ schedstat_set(curr->se.statistics.exec_max,
+ max(curr->se.statistics.exec_max, delta_exec));
+
+ curr->se.sum_exec_runtime += delta_exec;
+ account_group_exec_runtime(curr, delta_exec);
+
+ curr->se.exec_start = rq->clock_task;
+ cpuacct_charge(curr, delta_exec);
}

static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)
@@ -60,6 +77,9 @@ static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)

static void set_curr_task_stop(struct rq *rq)
{
+ struct task_struct *stop = rq->stop;
+
+ stop->se.exec_start = rq->clock_task;
}

static void switched_to_stop(struct rq *rq, struct task_struct *p)


2012-08-03 20:39:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [patch] sched: fix migration thread runtime bogosity

On Fri, 2012-08-03 at 15:42 +0200, Mike Galbraith wrote:
> Greetings,
>
> I have two bug reports of absurd migration thread CPU usage, one of them
> with a link to a bisection..
>
> https://bugs.gentoo.org/show_bug.cgi?id=394487
>
> ..fingering d670ec13 - posix-cpu-timers: Cure SMP wobbles
>
> I reproduced with my -rt kernel and 3.4, but didn't manage to reproduce
> with the 3.0 NOPREEMPT kernel it was reported against.

Ah, I've seen similar reports, never managed to reproduce though.


> Signed-off-by: Mike Galbraith <[email protected]>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 82ad284..82a78a6 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -974,6 +974,13 @@ void sched_set_stop_task(int cpu, struct task_struct *stop)
> sched_setscheduler_nocheck(stop, SCHED_FIFO, &param);
>
> stop->sched_class = &stop_sched_class;
> +
> + /* Zero stale values for our non-accountable thread. */
> + stop->se.exec_start = 0;
> + stop->se.sum_exec_runtime = 0;
> + stop->se.prev_sum_exec_runtime = 0;
> + stop->stime = stop->stimescaled = 0;
> + stop->nvcsw = stop->nivcsw = 0;
> }
>
> cpu_rq(cpu)->stop = stop;


Now the question is, how did that stop thing get any time to begin with?
Are we hotplugging or somesuch sillyness?


Anyway, I think I like B best, could you re-submit as a proper patch so
I can press the magic button that queues stuff?

2012-08-04 03:44:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: [patch] sched: fix migration thread runtime bogosity

On Fri, 2012-08-03 at 22:39 +0200, Peter Zijlstra wrote:

> Now the question is, how did that stop thing get any time to begin with?
> Are we hotplugging or somesuch sillyness?

Nope, high frequency exec.

> Anyway, I think I like B best, could you re-submit as a proper patch so
> I can press the magic button that queues stuff?

Ok, B it is. Since that SUSE guy munged my mailboxes again (twit), he
can write the changelog, and take the blame.

-Mike

From: Mike Galbraith <[email protected]>

sched: fix migration thread runtime bogosity

Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained ->se.exec_start. The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU PID USER CMD
99.3 45 root [migration/10]
97.7 53 root [migration/12]
97.0 57 root [migration/13]
90.1 49 root [migration/11]
89.6 65 root [migration/15]
88.7 17 root [migration/3]
80.4 37 root [migration/8]
78.1 41 root [migration/9]
44.2 13 root [migration/2]

Signed-off-by: Mike Galbraith <[email protected]>

diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 7b386e8..da5eb5b 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -27,8 +27,10 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
{
struct task_struct *stop = rq->stop;

- if (stop && stop->on_rq)
+ if (stop && stop->on_rq) {
+ stop->se.exec_start = rq->clock_task;
return stop;
+ }

return NULL;
}
@@ -52,6 +54,21 @@ static void yield_task_stop(struct rq *rq)

static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
{
+ struct task_struct *curr = rq->curr;
+ u64 delta_exec;
+
+ delta_exec = rq->clock_task - curr->se.exec_start;
+ if (unlikely((s64)delta_exec < 0))
+ delta_exec = 0;
+
+ schedstat_set(curr->se.statistics.exec_max,
+ max(curr->se.statistics.exec_max, delta_exec));
+
+ curr->se.sum_exec_runtime += delta_exec;
+ account_group_exec_runtime(curr, delta_exec);
+
+ curr->se.exec_start = rq->clock_task;
+ cpuacct_charge(curr, delta_exec);
}

static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)
@@ -60,6 +77,9 @@ static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)

static void set_curr_task_stop(struct rq *rq)
{
+ struct task_struct *stop = rq->stop;
+
+ stop->se.exec_start = rq->clock_task;
}

static void switched_to_stop(struct rq *rq, struct task_struct *p)

2012-08-13 16:54:20

by Mike Galbraith

[permalink] [raw]
Subject: [tip:sched/urgent] sched: Fix migration thread runtime bogosity

Commit-ID: 8f6189684eb4e85e6c593cd710693f09c944450a
Gitweb: http://git.kernel.org/tip/8f6189684eb4e85e6c593cd710693f09c944450a
Author: Mike Galbraith <[email protected]>
AuthorDate: Sat, 4 Aug 2012 05:44:14 +0200
Committer: Thomas Gleixner <[email protected]>
CommitDate: Mon, 13 Aug 2012 18:41:55 +0200

sched: Fix migration thread runtime bogosity

Make stop scheduler class do the same accounting as other classes,

Migration threads can be caught in the act while doing exec balancing,
leading to the below due to use of unmaintained ->se.exec_start. The
load that triggered this particular instance was an apparently out of
control heavily threaded application that does system monitoring in
what equated to an exec bomb, with one of the VERY frequently migrated
tasks being ps.

%CPU PID USER CMD
99.3 45 root [migration/10]
97.7 53 root [migration/12]
97.0 57 root [migration/13]
90.1 49 root [migration/11]
89.6 65 root [migration/15]
88.7 17 root [migration/3]
80.4 37 root [migration/8]
78.1 41 root [migration/9]
44.2 13 root [migration/2]

Signed-off-by: Mike Galbraith <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
---
kernel/sched/stop_task.c | 22 +++++++++++++++++++++-
1 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 7b386e8..da5eb5b 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -27,8 +27,10 @@ static struct task_struct *pick_next_task_stop(struct rq *rq)
{
struct task_struct *stop = rq->stop;

- if (stop && stop->on_rq)
+ if (stop && stop->on_rq) {
+ stop->se.exec_start = rq->clock_task;
return stop;
+ }

return NULL;
}
@@ -52,6 +54,21 @@ static void yield_task_stop(struct rq *rq)

static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
{
+ struct task_struct *curr = rq->curr;
+ u64 delta_exec;
+
+ delta_exec = rq->clock_task - curr->se.exec_start;
+ if (unlikely((s64)delta_exec < 0))
+ delta_exec = 0;
+
+ schedstat_set(curr->se.statistics.exec_max,
+ max(curr->se.statistics.exec_max, delta_exec));
+
+ curr->se.sum_exec_runtime += delta_exec;
+ account_group_exec_runtime(curr, delta_exec);
+
+ curr->se.exec_start = rq->clock_task;
+ cpuacct_charge(curr, delta_exec);
}

static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)
@@ -60,6 +77,9 @@ static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)

static void set_curr_task_stop(struct rq *rq)
{
+ struct task_struct *stop = rq->stop;
+
+ stop->se.exec_start = rq->clock_task;
}

static void switched_to_stop(struct rq *rq, struct task_struct *p)