2006-03-30 10:19:44

by Mike Galbraith

[permalink] [raw]
Subject: [rfc][patch] improved interactive starvation patch against 2.6.16

Greetings,

The patch below alone makes virgin 2.6.16 usable in the busy apache
server scenario, and should help quite a bit with other situations as
well.

The original version helps a lot, but not enough, and the latency of
being awakened in the expired array can be needlessly painful. Ergo,
speed up the array switch, and instead of unconditionally plunking all
awakening tasks (including rt tasks, oops) into the expired array, check
to see if the task has run since the last array switch first. This
leaves a theoretical hole for a stream of one-time waking tasks to
starve the expired array indefinitely, but it deals with the real
problem pretty nicely I think.

For the one or two folks on the planet testing my anti-starvation
patches, I've attached an incremental to my 2.6.16 test release.

-Mike

--- linux-2.6.16/kernel/sched.c.org 2006-03-30 11:05:31.000000000 +0200
+++ linux-2.6.16/kernel/sched.c 2006-03-30 11:10:02.000000000 +0200
@@ -77,6 +77,20 @@
#define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 / HZ))
#define JIFFIES_TO_NS(TIME) ((TIME) * (1000000000 / HZ))

+#if (BITS_PER_LONG < 64)
+#define JIFFIES_TO_NS64(TIME) \
+ ((unsigned long long)(TIME) * ((unsigned long) (1000000000 / HZ)))
+
+#define NS64_TO_JIFFIES(TIME) \
+ ((((unsigned long long)((TIME)) >> BITS_PER_LONG) * \
+ (1 + NS_TO_JIFFIES(~0UL))) + NS_TO_JIFFIES((unsigned long)(TIME)))
+#else /* BITS_PER_LONG < 64 */
+
+#define NS64_TO_JIFFIES(TIME) NS_TO_JIFFIES(TIME)
+#define JIFFIES_TO_NS64(TIME) JIFFIES_TO_NS(TIME)
+
+#endif /* BITS_PER_LONG < 64 */
+
/*
* These are the 'tuning knobs' of the scheduler:
*
@@ -220,7 +234,7 @@
*/
unsigned long nr_uninterruptible;

- unsigned long expired_timestamp;
+ unsigned long long expired_timestamp;
unsigned long long timestamp_last_tick;
task_t *curr, *idle;
struct mm_struct *prev_mm;
@@ -662,11 +676,60 @@
}

/*
+ * We place interactive tasks back into the active array, if possible.
+ *
+ * To guarantee that this does not starve expired tasks we ignore the
+ * interactivity of a task if the first expired task had to wait more
+ * than a 'reasonable' amount of time. This deadline timeout is
+ * load-dependent, as the frequency of array switched decreases with
+ * increasing number of running tasks. We also ignore the interactivity
+ * if a better static_prio task has expired, and switch periodically
+ * regardless, to ensure that highly interactive tasks do not starve
+ * the less fortunate for unreasonably long periods.
+ */
+static inline int expired_starving(runqueue_t *rq)
+{
+ int limit;
+
+ /*
+ * Arrays were recently switched, all is well.
+ */
+ if (!rq->expired_timestamp)
+ return 0;
+
+ limit = STARVATION_LIMIT * rq->nr_running;
+
+ /*
+ * It's time to switch arrays.
+ */
+ if (jiffies - NS64_TO_JIFFIES(rq->expired_timestamp) >= limit)
+ return 1;
+
+ /*
+ * There's a better selection in the expired array.
+ */
+ if (rq->curr->static_prio > rq->best_expired_prio)
+ return 1;
+
+ /*
+ * All is well.
+ */
+ return 0;
+}
+
+/*
* __activate_task - move a task to the runqueue.
*/
static inline void __activate_task(task_t *p, runqueue_t *rq)
{
- enqueue_task(p, rq->active);
+ prio_array_t *array = rq->active;
+ if (unlikely(expired_starving(rq) && !rt_task(p) &&
+ p->last_ran > rq->expired_timestamp)) {
+ array = rq->expired;
+ if (p->prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->prio;
+ }
+ enqueue_task(p, array);
rq->nr_running++;
}

@@ -2461,22 +2524,6 @@
}

/*
- * We place interactive tasks back into the active array, if possible.
- *
- * To guarantee that this does not starve expired tasks we ignore the
- * interactivity of a task if the first expired task had to wait more
- * than a 'reasonable' amount of time. This deadline timeout is
- * load-dependent, as the frequency of array switched decreases with
- * increasing number of running tasks. We also ignore the interactivity
- * if a better static_prio task has expired:
- */
-#define EXPIRED_STARVING(rq) \
- ((STARVATION_LIMIT && ((rq)->expired_timestamp && \
- (jiffies - (rq)->expired_timestamp >= \
- STARVATION_LIMIT * ((rq)->nr_running) + 1))) || \
- ((rq)->curr->static_prio > (rq)->best_expired_prio))
-
-/*
* Account user cpu time to a process.
* @p: the process that the cpu time gets accounted to
* @hardirq_offset: the offset to subtract from hardirq_count()
@@ -2574,12 +2621,12 @@
return;
}

+ spin_lock(&rq->lock);
/* Task might have expired already, but not scheduled off yet */
if (p->array != rq->active) {
set_tsk_need_resched(p);
- goto out;
+ goto out_unlock;
}
- spin_lock(&rq->lock);
/*
* The task was running during this tick - update the
* time slice counter. Note: we do not update a thread's
@@ -2610,15 +2657,38 @@
p->first_time_slice = 0;

if (!rq->expired_timestamp)
- rq->expired_timestamp = jiffies;
- if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
+ rq->expired_timestamp = now;
+ if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
enqueue_task(p, rq->expired);
- if (p->static_prio < rq->best_expired_prio)
- rq->best_expired_prio = p->static_prio;
+ if (p->prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->prio;
} else
enqueue_task(p, rq->active);
} else {
/*
+ * If tasks in the expired array are starving, increase the
+ * speed of the array switch. If we do not, tasks which are
+ * awakened on the expired array may suffer severe latency
+ * due to cpu hogs using their full slice. We don't want to
+ * switch too fast however, because it may well be these very
+ * tasks which were causing starvation to begin with.
+ */
+ if (expired_starving(rq)) {
+ int limit = MIN_TIMESLICE + CURRENT_BONUS(p);
+ int runtime = now - p->timestamp;
+
+ runtime = NS_TO_JIFFIES(runtime);
+ if (runtime >= limit && p->time_slice >= limit) {
+
+ dequeue_task(p, rq->active);
+ enqueue_task(p, rq->expired);
+ set_tsk_need_resched(p);
+ if (p->prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->prio;
+ }
+ }
+
+ /*
* Prevent a too long timeslice allowing a task to monopolize
* the CPU. We do this by splitting up the timeslice into
* smaller pieces.
@@ -2634,10 +2704,9 @@
* This only applies to tasks in the interactive
* delta range with at least TIMESLICE_GRANULARITY to requeue.
*/
- if (TASK_INTERACTIVE(p) && !((task_timeslice(p) -
+ else if (TASK_INTERACTIVE(p) && !((task_timeslice(p) -
p->time_slice) % TIMESLICE_GRANULARITY(p)) &&
- (p->time_slice >= TIMESLICE_GRANULARITY(p)) &&
- (p->array == rq->active)) {
+ (p->time_slice >= TIMESLICE_GRANULARITY(p))) {

requeue_task(p, rq->active);
set_tsk_need_resched(p);


Attachments:
sched_8_interactive_starvation2.diff (3.23 kB)

2006-03-30 11:07:39

by Willy Tarreau

[permalink] [raw]
Subject: Re: [rfc][patch] improved interactive starvation patch against 2.6.16

Hi Mike,

On Thu, Mar 30, 2006 at 12:19:57PM +0200, Mike Galbraith wrote:
> Greetings,
>
> The patch below alone makes virgin 2.6.16 usable in the busy apache
> server scenario, and should help quite a bit with other situations as
> well.
>
> The original version helps a lot, but not enough, and the latency of
> being awakened in the expired array can be needlessly painful. Ergo,
> speed up the array switch, and instead of unconditionally plunking all
> awakening tasks (including rt tasks, oops) into the expired array, check
> to see if the task has run since the last array switch first. This
> leaves a theoretical hole for a stream of one-time waking tasks to
> starve the expired array indefinitely, but it deals with the real
> problem pretty nicely I think.

Interesting.

> For the one or two folks on the planet testing my anti-starvation
> patches, I've attached an incremental to my 2.6.16 test release.

Thanks, I'll test this ASAP, though I'm clearly busy right now.

Cheers,
Willy

2006-03-30 11:56:50

by Willy Tarreau

[permalink] [raw]
Subject: Re: [rfc][patch] improved interactive starvation patch against 2.6.16

On Thu, Mar 30, 2006 at 12:19:57PM +0200, Mike Galbraith wrote:
> Greetings,
>
> The patch below alone makes virgin 2.6.16 usable in the busy apache
> server scenario, and should help quite a bit with other situations as
> well.
>
> The original version helps a lot, but not enough, and the latency of
> being awakened in the expired array can be needlessly painful. Ergo,
> speed up the array switch, and instead of unconditionally plunking all
> awakening tasks (including rt tasks, oops) into the expired array, check
> to see if the task has run since the last array switch first. This
> leaves a theoretical hole for a stream of one-time waking tasks to
> starve the expired array indefinitely, but it deals with the real
> problem pretty nicely I think.
>
> For the one or two folks on the planet testing my anti-starvation
> patches, I've attached an incremental to my 2.6.16 test release.

Hmmm does not apply here :

willy@wtap:linux-2.6.16-sched25$ cat /data/src/tmp/sched_*|patch -Nsp1
willy@wtap:linux-2.6.16-sched25$ patch -p1 < /data/src/tmp/patch-2.6.16-sched-2.txt
patching file kernel/sched.c
Hunk #1 succeeded at 71 with fuzz 2 (offset -6 lines).
Hunk #2 succeeded at 391 (offset 157 lines).
Hunk #3 FAILED at 833.
Hunk #4 FAILED at 2681.
Hunk #5 succeeded at 2770 (offset 149 lines).
Hunk #6 FAILED at 2806.
Hunk #7 succeeded at 2866 (offset 162 lines).
3 out of 7 hunks FAILED -- saving rejects to file kernel/sched.c.rej

However, it applies to plain 2.6.16 with sched_[1-6] applied (but with fuzz).
Parts of the patches are already applied by your previous patches (eg: chunk1).
I suspect that you accidentely rediffed sched.c against an intermediate
working tree instead.

For instance, this part is already provided by patch 4 :
@@ -77,6 +77,21 @@
#define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 / HZ))
#define JIFFIES_TO_NS(TIME) ((TIME) * (1000000000 / HZ))

+#if (BITS_PER_LONG < 64)
+#define JIFFIES_TO_NS64(TIME) \
.../...

And this part overlaps with a previous chunk in patch 7 :

- rq->expired_timestamp = jiffies;
- if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
+ rq->expired_timestamp = now;
+ if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {

Regards,
Willy

2006-03-30 14:22:21

by Mike Galbraith

[permalink] [raw]
Subject: Re: [rfc][patch] improved interactive starvation patch against 2.6.16

On Thu, 2006-03-30 at 13:55 +0200, Willy Tarreau wrote:
> On Thu, Mar 30, 2006 at 12:19:57PM +0200, Mike Galbraith wrote:

> > For the one or two folks on the planet testing my anti-starvation
> > patches, I've attached an incremental to my 2.6.16 test release.
>
> Hmmm does not apply here :
>
> willy@wtap:linux-2.6.16-sched25$ cat /data/src/tmp/sched_*|patch -Nsp1
> willy@wtap:linux-2.6.16-sched25$ patch -p1 < /data/src/tmp/patch-2.6.16-sched-2.txt
> patching file kernel/sched.c
> Hunk #1 succeeded at 71 with fuzz 2 (offset -6 lines).
> Hunk #2 succeeded at 391 (offset 157 lines).
> Hunk #3 FAILED at 833.
> Hunk #4 FAILED at 2681.
> Hunk #5 succeeded at 2770 (offset 149 lines).
> Hunk #6 FAILED at 2806.
> Hunk #7 succeeded at 2866 (offset 162 lines).
> 3 out of 7 hunks FAILED -- saving rejects to file kernel/sched.c.rej
>
> However, it applies to plain 2.6.16 with sched_[1-6] applied (but with fuzz).
> Parts of the patches are already applied by your previous patches (eg: chunk1).
> I suspect that you accidentely rediffed sched.c against an intermediate
> working tree instead.
>
> For instance, this part is already provided by patch 4 :
> @@ -77,6 +77,21 @@
> #define NS_TO_JIFFIES(TIME) ((TIME) / (1000000000 / HZ))
> #define JIFFIES_TO_NS(TIME) ((TIME) * (1000000000 / HZ))
>
> +#if (BITS_PER_LONG < 64)
> +#define JIFFIES_TO_NS64(TIME) \
> .../...
>
> And this part overlaps with a previous chunk in patch 7 :
>
> - rq->expired_timestamp = jiffies;
> - if (!TASK_INTERACTIVE(p) || EXPIRED_STARVING(rq)) {
> + rq->expired_timestamp = now;
> + if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {

You applied the wrong patch. The inlined one was against virgin 2.6.16,
for folks to comment on, and the attached was the incremental against
the 2.6.16 anti-starvation tree.

Ah, your mailer probably showed both as attached. Anyway, here's the
incremental again inlined.

(note to self: never include patches for two trees in same message)

-Mike

--- linux-2.6.16/kernel/sched.c-7.interactive_starvation 2006-03-27 06:11:01.000000000 +0200
+++ linux-2.6.16/kernel/sched.c 2006-03-30 10:50:46.000000000 +0200
@@ -371,7 +371,7 @@
*/
unsigned long nr_uninterruptible;

- unsigned long expired_timestamp;
+ unsigned long long expired_timestamp;
unsigned long long timestamp_last_tick;
task_t *curr, *idle;
struct mm_struct *prev_mm;
@@ -839,7 +839,7 @@
/*
* It's time to switch arrays.
*/
- if (jiffies - rq->expired_timestamp >= limit)
+ if (jiffies - NS64_TO_JIFFIES(rq->expired_timestamp) >= limit)
return 1;

/*
@@ -860,8 +860,12 @@
static inline void __activate_task(task_t *p, runqueue_t *rq)
{
prio_array_t *array = rq->active;
- if (unlikely(expired_starving(rq)))
+ if (unlikely(expired_starving(rq) && !rt_task(p) &&
+ p->last_ran > rq->expired_timestamp)) {
array = rq->expired;
+ if (p->prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->prio;
+ }
enqueue_task(p, array);
rq->nr_running++;
}
@@ -2880,12 +2884,12 @@
return;
}

+ spin_lock(&rq->lock);
/* Task might have expired already, but not scheduled off yet */
if (p->array != rq->active) {
set_tsk_need_resched(p);
- goto out;
+ goto out_unlock;
}
- spin_lock(&rq->lock);
/*
* The task was running during this tick - update the
* time slice counter. Note: we do not update a thread's
@@ -2921,15 +2925,38 @@
p->prio = effective_prio(p);

if (!rq->expired_timestamp)
- rq->expired_timestamp = jiffies;
+ rq->expired_timestamp = now;
if (!TASK_INTERACTIVE(p) || expired_starving(rq)) {
enqueue_task(p, rq->expired);
- if (p->static_prio < rq->best_expired_prio)
- rq->best_expired_prio = p->static_prio;
+ if (p->prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->prio;
} else
enqueue_task(p, rq->active);
} else {
/*
+ * If tasks in the expired array are starving, increase the
+ * speed of the array switch. If we do not, tasks which are
+ * awakened on the expired array may suffer severe latency
+ * due to cpu hogs using their full slice. We don't want to
+ * switch too fast however, because it may well be these very
+ * tasks which were causing starvation to begin with.
+ */
+ if (expired_starving(rq)) {
+ int limit = MIN_TIMESLICE + CURRENT_BONUS(p);
+ int runtime = now - p->timestamp;
+
+ runtime = NS_TO_JIFFIES(runtime);
+ if (runtime >= limit && p->time_slice >= limit) {
+
+ dequeue_task(p, rq->active);
+ enqueue_task(p, rq->expired);
+ set_tsk_need_resched(p);
+ if (p->prio < rq->best_expired_prio)
+ rq->best_expired_prio = p->prio;
+ }
+ }
+
+ /*
* Prevent a too long timeslice allowing a task to monopolize
* the CPU. We do this by splitting up the timeslice into
* smaller pieces.
@@ -2945,10 +2972,9 @@
* This only applies to tasks in the interactive
* delta range with at least TIMESLICE_GRANULARITY to requeue.
*/
- if (TASK_INTERACTIVE(p) && !((task_timeslice(p) -
+ else if (TASK_INTERACTIVE(p) && !((task_timeslice(p) -
p->time_slice) % TIMESLICE_GRANULARITY(p)) &&
- (p->time_slice >= TIMESLICE_GRANULARITY(p)) &&
- (p->array == rq->active)) {
+ (p->time_slice >= TIMESLICE_GRANULARITY(p))) {

requeue_task(p, rq->active);
set_tsk_need_resched(p);


2006-03-30 14:53:43

by Willy Tarreau

[permalink] [raw]
Subject: Re: [rfc][patch] improved interactive starvation patch against 2.6.16

On Thu, Mar 30, 2006 at 04:22:38PM +0200, Mike Galbraith wrote:

> You applied the wrong patch. The inlined one was against virgin 2.6.16,
> for folks to comment on, and the attached was the incremental against
> the 2.6.16 anti-starvation tree.
>
> Ah, your mailer probably showed both as attached. Anyway, here's the
> incremental again inlined.

Ah OK, it's not even the mailer, it's me. I read the message and did not
notice the attachment, just the inline patch. Then I forwarded it to my
work address and lost the attachment. You know the rest ...

> (note to self: never include patches for two trees in same message)

Good advice. Or manually edit the directory name in the diff to
explicitly show what it refers to. Eg:

--- linux-2.6.16-sched7/kernel/sched.c 2006-03-27 06:11:01.000000000 +0200
+++ linux-2.6.16/kernel/sched.c 2006-03-30 10:50:46.000000000 +0200

Anyway, thanks for your reply, I'll re-apply it and recompile.

> -Mike

Cheers,
Willy

2006-03-30 18:23:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: [rfc][patch] improved interactive starvation patch against 2.6.16

On Thu, 2006-03-30 at 16:22 +0200, Mike Galbraith wrote:
> + if (expired_starving(rq)) {
> + int limit = MIN_TIMESLICE + CURRENT_BONUS(p);
> + int runtime = now - p->timestamp;
> +
> + runtime = NS_TO_JIFFIES(runtime);
> + if (runtime >= limit && p->time_slice >= limit) {
> +
> + dequeue_task(p, rq->active);
> + enqueue_task(p, rq->expired);
> + set_tsk_need_resched(p);
> + if (p->prio < rq->best_expired_prio)
> + rq->best_expired_prio = p->prio;
> + }
> + }

This bit isn't cutting it. expired_starving() is routine enough that
short slicing (on this scale at least) is noticed. Don't waste your
time with this one.

-Mike