2003-02-17 08:04:48

by Zwane Mwaikambo

[permalink] [raw]
Subject: [PATCH][2.5] Don't wake up tasks on offline processors

This patch stops waking up of tasks onto offline processors. We need this
when migrating tasks from offline processors onto other online ones and to
avert a livelock whilst doing so.

Index: linux-2.5.61-trojan/kernel/sched.c
===================================================================
RCS file: /build/cvsroot/linux-2.5.61/kernel/sched.c,v
retrieving revision 1.1.1.1
diff -u -r1.1.1.1 sched.c
--- linux-2.5.61-trojan/kernel/sched.c 15 Feb 2003 12:32:44 -0000 1.1.1.1
+++ linux-2.5.61-trojan/kernel/sched.c 15 Feb 2003 16:04:51 -0000
@@ -465,7 +473,8 @@
* Fast-migrate the task if it's not running or runnable
* currently. Do not violate hard affinity.
*/
- if (unlikely(sync && !task_running(rq, p) &&
+ if (likely(cpu_online(smp_processor_id())) &&
+ unlikely(sync && !task_running(rq, p) &&
(task_cpu(p) != smp_processor_id()) &&
(p->cpus_allowed & (1UL << smp_processor_id())))) {



2003-02-17 14:20:56

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.5] Don't wake up tasks on offline processors

On Mon, 17 Feb 2003, Ingo Molnar wrote:

>
> On Mon, 17 Feb 2003, Zwane Mwaikambo wrote:
>
> > This patch stops waking up of tasks onto offline processors. We need
> > this when migrating tasks from offline processors onto other online ones
> > and to avert a livelock whilst doing so.
>
> this code too should be done in a separate 'zap_runqueue()' function,
> which also needs to iterate through all tasks and migrate them off to an
> online CPU. This code definitely does not belong into the wakeup hotpath.

This is the current code to migrate tasks off a dead cpu;

diff -u -r1.1.1.1 sched.c
--- linux-2.5.61-trojan/kernel/sched.c 15 Feb 2003 12:32:44 -0000 1.1.1.1
+++ linux-2.5.61-trojan/kernel/sched.c 17 Feb 2003 06:47:05 -0000
@@ -2235,6 +2253,102 @@
wait_for_completion(&req.done);
}

+/* Move (not current) task off this cpu, onto dest cpu. Reference to
+ task must be held. */
+static void move_task_away(struct task_struct *p, unsigned int dest_cpu)
+{
+ runqueue_t *rq_dest;
+ unsigned long flags;
+
+ rq_dest = cpu_rq(dest_cpu);
+
+ if (task_cpu(p) != smp_processor_id())
+ return; /* Already moved */
+
+ local_irq_save(flags);
+ double_rq_lock(this_rq(), rq_dest);
+ if (task_cpu(p) != smp_processor_id())
+ goto out; /* Already moved */
+
+ set_task_cpu(p, dest_cpu);
+ if (p->array) {
+ deactivate_task(p, this_rq());
+ activate_task(p, rq_dest);
+ if (p->prio < rq_dest->curr->prio)
+ resched_task(rq_dest->curr);
+ }
+ out:
+ double_rq_unlock(this_rq(), rq_dest);
+ local_irq_restore(flags);
+}
+
+#if defined(CONFIG_HOTPLUG) && defined(CONFIG_SMP)
+/* Slow but sure. We don't fight against load_balance, new people
+ setting affinity, or try_to_wake_up's fast path pulling things in,
+ as cpu_online() no longer true. */
+static int move_all_tasks(unsigned int kill_it)
+{
+ unsigned int num_signalled = 0;
+ unsigned int dest_cpu;
+ struct task_struct *g, *t;
+ unsigned long cpus_allowed;
+
+ again:
+ read_lock(&tasklist_lock);
+ do_each_thread(g, t) {
+ if (t == current)
+ continue;
+
+ /* Kernel threads which are bound to specific
+ processors need to look after themselves
+ with their own callbacks */
+ if (t->mm == NULL && t->cpus_allowed != ~0UL)
+ continue;
+
+ if (task_cpu(t) == smp_processor_id()) {
+ get_task_struct(t);
+ goto move_one;
+ }
+ } while_each_thread(g, t);
+ read_unlock(&tasklist_lock);
+ return num_signalled;
+
+ move_one:
+ read_unlock(&tasklist_lock);
+ cpus_allowed = t->cpus_allowed & ~(1UL << smp_processor_id());
+ dest_cpu = any_online_cpu(cpus_allowed);
+ if (dest_cpu < 0) {
+ num_signalled++;
+ if (!kill_it) {
+ /* FIXME: New signal needed? --RR */
+ force_sig(SIGPWR, t);
+ goto again;
+ }
+ /* Kill it (it can die on any CPU). */
+ t->cpus_allowed = ~(1 << smp_processor_id());
+ dest_cpu = any_online_cpu(t->cpus_allowed);
+ force_sig(SIGKILL, t);
+ }
+ move_task_away(t, dest_cpu);
+ put_task_struct(t);
+ goto again;
+}
+
+/* Move non-kernel-thread tasks off this (offline) CPU, except us. */
+void migrate_all_tasks(void)
+{
+ if (move_all_tasks(0)) {
+ /* Wait for processes to react to signal */
+ schedule_timeout(30*HZ);
+ move_all_tasks(1);
+ }
+}
+#endif /* CONFIG_HOTPLUG */
+
+/* This is the CPU to stop, and who to wake about it */
+static int migration_stop = -1;
+static struct completion migration_stopped;
+
/*
* migration_thread - this is a highprio system thread that performs
* thread migration by 'pulling' threads into the target runqueue.

--
function.linuxpower.ca

2003-02-17 14:23:11

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][2.5] Don't wake up tasks on offline processors


On Mon, 17 Feb 2003, Zwane Mwaikambo wrote:

> This is the current code to migrate tasks off a dead cpu;

looks good in principle, but to avoid races i'd rather suggest to lock
_all_ runqueues in one big swoop, and then just move everything as
apropriate. It's not like this code has to be highly effective.

Ingo

2003-02-17 14:32:13

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: [PATCH][2.5] Don't wake up tasks on offline processors

On Mon, 17 Feb 2003, Ingo Molnar wrote:

>
> On Mon, 17 Feb 2003, Zwane Mwaikambo wrote:
>
> > This is the current code to migrate tasks off a dead cpu;
>
> looks good in principle, but to avoid races i'd rather suggest to lock
> _all_ runqueues in one big swoop, and then just move everything as
> apropriate. It's not like this code has to be highly effective.

Ok i'll have a go at that instead, however how hard would it be to do a
multiple lock acquisition of that magnitude on 16+ cpus?

Thanks,
Zwane
--
function.linuxpower.ca

2003-02-17 14:27:39

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][2.5] Don't wake up tasks on offline processors


On Mon, 17 Feb 2003, Zwane Mwaikambo wrote:

> This patch stops waking up of tasks onto offline processors. We need
> this when migrating tasks from offline processors onto other online ones
> and to avert a livelock whilst doing so.

this code too should be done in a separate 'zap_runqueue()' function,
which also needs to iterate through all tasks and migrate them off to an
online CPU. This code definitely does not belong into the wakeup hotpath.

Ingo

2003-02-17 14:44:21

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH][2.5] Don't wake up tasks on offline processors


On Mon, 17 Feb 2003, Zwane Mwaikambo wrote:

> Ok i'll have a go at that instead, however how hard would it be to do a
> multiple lock acquisition of that magnitude on 16+ cpus?

just do a simple loop of spin_lock()'s over all online CPUs, in forward
order, ordering between runqueue locks is ordered by CPU number.

Ingo