Hi,
there is a potential lockup problem with the current
set_cpus_allowed() mechanism. As the comment right before the function
suggests, one should make sure that the task referenced is valid and that
it doesn't exit prematurely. Therefore in a pseudo-device driver used for
changing the cpus_allowed masks of processes I do:
read_lock(&tasklist_lock);
task = find_task_by_pid(pid);
if (task)
set_cpus_allowed(task, new_mask);
read_unlock(&tasklist_lock);
This is accessed by a user-space program, for example:
% oncpu -c 0 $$
pins the shell to logical CPU #0.
The lockup with the new task migration mechanism can be observed if one
does:
for cpu in 1 3 5 ; do
oncpu -c $cpu $$
run_job.sh > out.$cpu &
done
For cpu=1 everything goes well, the shell is pinned to CPU#1, forks
run_job.sh (which itself forks another process) which is pinned to CPU#1,
too (cpus_allowed is inherited by children).
For cpu=3 the shell forks, execs "oncpu -c 3 $$" (which is pinned to
CPU#1), this one calls the device driver which gets a read_lock of
tasklist_lock. Then set_cpus_allowed() wakes up migration_CPU1 which
migrates the shell to CPU#3. When migration_CPU1 finishes, it reactivates
the "oncpu" process (which was running on CPU#1) but the currently running
process is "run_job.sh" on CPU#1. This one forks and spinlocks on
tasklist_lock, which it will never get because "oncpu" never gets the
chance to read_unlock(&tasklist_lock).
So the problem is that it is not guaranteed that the task calling
set_cpus_allowed() will run right after migration_task() finished its
job.
How about raising the priority in set_cpus_allowed() before the
down(&req.sem);
and resetting it afterwards? Would this help?
Another possible solution would be to take care of the validity of the
task inside set_cpus_allowed(), i.e. pass it a pid as argument, lock the
tasklist there and unlock it in migration_task().
Any ideas, comments?
Best regards,
Erich
> Any ideas, comments?
You could just use get_task_struct/free_task_struct and drop the read lock
to make sure the target doesn't go way
-Andi
On Fri, 5 Apr 2002, Andi Kleen wrote:
> You could just use get_task_struct/free_task_struct and drop the read lock
> to make sure the target doesn't go way
Thanks! That solves my problem. So maybe Ingo just extends the comment to
set_cpus_allowed() to warn from holding the tasklist_lock such that others
don't run into the same problem.
Regards,
Erich