2002-02-01 14:19:49

by Erich Focht

[permalink] [raw]
Subject: O(1) J9 scheduler: set_cpus_allowed

Hi Ingo,

the function set_cpus_allowed(task_t *p, unsigned long new_mask)
works "as is" only if called for the task p=current. The appended patch
corrects this and enables e.g. external load balancers to change the
cpus_allowed mask of an arbitrary process.

BTW: how about migrating the definition of the structures runqueue and
prio_array into include/linux/sched.h and exporting the symbol
runqueues? It would help with debugging, monitoring and other
developments.

Thanks,
Best regards,
Erich

--- 2.4.17-IA64-kdb-J9/kernel/sched.c Thu Jan 31 18:39:37 2002
+++ 2.4.17-IA64-kdb-J9ia64/kernel/sched.c Fri Feb 1 16:06:40 2002
@@ -859,16 +868,16 @@

p->cpus_allowed = new_mask;
/*
- * Can the task run on the current CPU? If not then
+ * Can the task run on its current CPU? If not then
* migrate the process off to a proper CPU.
*/
- if (new_mask & (1UL << smp_processor_id()))
+ if (new_mask & (1UL << p->cpu))
return;
#if CONFIG_SMP
- current->state = TASK_UNINTERRUPTIBLE;
- smp_migrate_task(__ffs(new_mask), current);
-
- schedule();
+ p->state = TASK_UNINTERRUPTIBLE;
+ smp_migrate_task(__ffs(new_mask), p);
+ if (p == current)
+ schedule();
#endif
}



2002-02-01 15:10:42

by Ingo Molnar

[permalink] [raw]
Subject: Re: O(1) J9 scheduler: set_cpus_allowed


On Fri, 1 Feb 2002, Erich Focht wrote:

> the function set_cpus_allowed(task_t *p, unsigned long new_mask) works
> "as is" only if called for the task p=current. The appended patch
> corrects this and enables e.g. external load balancers to change the
> cpus_allowed mask of an arbitrary process.

your patch does not solve the problem, the situation is more complex. What
happens if the target task is not 'current' and is running on some other
CPU? If we send the migration interrupt then nothing guarantees that the
task will reschedule anytime soon, so the target CPU will keep spinning
indefinitely. There are other problems too, like crossing calls to
set_cpus_allowed(), etc. Right now set_cpus_allowed() can only be used for
the current task, and must be used by kernel code that knows what it does.

i've put a BUG() into set_cpus_allowed() that prevents 'wrong' usage.

> BTW: how about migrating the definition of the structures runqueue and
> prio_array into include/linux/sched.h and exporting the symbol
> runqueues? It would help with debugging, monitoring and other
> developments.

for debugging you can export it temporarily. Otherwise i consider it a
feature that no scheduler internals are visible externally in any way.

Ingo

2002-02-01 16:16:13

by Erich Focht

[permalink] [raw]
Subject: Re: O(1) J9 scheduler: set_cpus_allowed

Do I understand correctly that there is no clean way right now to change
cpus_allowed of a task (except for current)? In the old scheduler it was
enough to set cpus_allowed and need_resched=1...

On Fri, 1 Feb 2002, Ingo Molnar wrote:

> > the function set_cpus_allowed(task_t *p, unsigned long new_mask) works
> > "as is" only if called for the task p=current. The appended patch
> > corrects this and enables e.g. external load balancers to change the
> > cpus_allowed mask of an arbitrary process.
>
> your patch does not solve the problem, the situation is more complex. What
> happens if the target task is not 'current' and is running on some other
> CPU? If we send the migration interrupt then nothing guarantees that the
> task will reschedule anytime soon, so the target CPU will keep spinning
> indefinitely. There are other problems too, like crossing calls to
> set_cpus_allowed(), etc. Right now set_cpus_allowed() can only be used for
> the current task, and must be used by kernel code that knows what it does.

I understand your point about crossing calls. Besides this I still think
that set_cpus_allowed() does the job, after all wait_task_inactive() has
to return sometime. But of course, the target CPU shouldn't spend this
time in an interrupt routine.

What if you just 'remember' the task you want to take over on the target
CPU and do the real work later in sched_tick()? After all the task to be
moved has the state TASK_UNINTERRUPTIBLE and should be dequeued soon. In
sched_tick() then one would check whether the array field is NULL instead
of wait_task_inactive()...


> > BTW: how about migrating the definition of the structures runqueue and
> > prio_array into include/linux/sched.h and exporting the symbol
> > runqueues? It would help with debugging, monitoring and other
> > developments.
>
> for debugging you can export it temporarily. Otherwise i consider it a
> feature that no scheduler internals are visible externally in any way.

Hmmm, some things could be useful to see from outside. rq->nr_running, for
example.

Regards,

Erich

2002-02-03 12:06:30

by Ingo Molnar

[permalink] [raw]
Subject: Re: O(1) J9 scheduler: set_cpus_allowed


On Fri, 1 Feb 2002, Erich Focht wrote:

> Do I understand correctly that there is no clean way right now to
> change cpus_allowed of a task (except for current)? In the old
> scheduler it was enough to set cpus_allowed and need_resched=1...

well, there is a way, by fixing the current mechanizm. But since nothing
uses it currently it wont get much testing. I only pointed out that the
patch does not solve some of the races.

Ingo