Paul,
You had once revealed a cute one-line command to move all tasks from
one cpuset to another [1], which was:
# move all tasks from top cpuset to 'foo' cpuset
sed -nu p < /dev/cpuset/tasks > /dev/cpuset/foo/tasks
I somewhat regret now having fallen for it and using it in my scripts.
To my agony, I found that it moves per-cpu kernel threads too, forcibly
breaking their affinity. In my case, rq->migration thread
(kernel/sched.c) was moved off cpu3 and started running on cpu2, which
caused nasty problems for me. I am sure this can lead to problems for
other per-cpu kernel threads, if their assumption of per-cpu'ness is
broken this way.
One could argue that 'root' user did this and nothing wrong in assuming
he knows what he is doing.
But I am wondering if attach_task() should leave kernel threads alone and
act only upon user-space threads. Or maybe allow movement if it doesn't
result in changing kernel-threads's cpu affinity.
Do you have anything to say regarding this?
Fyi, this was what I was doing (as root):
#!/bin/bash
mount -t container -o cpuset none /dev/cpuset
cd /dev/cpuset
mkdir sys # create a cpuset to move all tasks into
mkdir test # test cpuset in which my tests will run
# Assign cpus to both cpusets
cd sys; echo 0-2 > cpus; echo 0 > mems; echo 1 > cpu_exclusive; cd ..
cd test; echo 3 > cpus; echo 0 > mems; echo 1 > cpu_exclusive; cd ..
# Move all tasks to 'sys' cpuset so that cpu3 is dedicated to
# only my chosen tasks
sed -nu p < /dev/cpuset/tasks > /dev/cpuset/tasks
echo $$ > test/tasks
/path_to/test_prg
References:
1. http://marc.info/?l=linux-kernel&m=115627306628524
--
Regards,
vatsa
Srivatsa wrote:
# move all tasks from top cpuset to 'foo' cpuset
sed -nu p < /dev/cpuset/tasks > /dev/cpuset/foo/tasks
Aha - that won't work very well, as you noticed.
In looking through my past email archives, I can see where I have
recommended this trick to move things -into- the top cpuset, but
not to move them -out-of- the top cpuset. I would not (except by
mistake) recommend using the above trick to move tasks out of the
top cpuset, for the very reason you note.
I do have code to move tasks out of the top cpuset in bulk, but it
is written in C, as part of some SGI proprietary product (though
it relies mostly on some LGPL licensed bitmask and cpuset libraries
that I have written for SGI.)
The basic idea is to avoid moving the threads whose 'Cpus_allowed'
value in their /proc/<pid>/status file is a strict subset of (less
than, but -not- equal to) the set of online cpus. The kernel threads
that you don't want to move are those that are pinned to specific
cpus or nodes; they are where they must be for their purposes.
Kernel threads that are not pinned can be moved just as readily
as user threads; they just need to be someplace.
I don't know any easy way to script this. We just don't have the
tools in shell script to conveniently work with sets or bit masks.
... not yet anyway ... see below.
Checking whether a tasks Cpus_allowed is a strict subset of the
set of online cpus may not always be the check needed, depending
on what one is doing. But it matched my needs, and from the looks
of what you're doing in that script, making the sys and test cpusets,
it might well match your needs as well, as your needs look rather
similar to what mine were.
> But I am wondering if attach_task() should leave kernel threads alone and
> act only upon user-space threads. Or maybe allow movement if it doesn't
> result in changing kernel-threads's cpu affinity.
I tend to favor minimizing kernel support, where user level support
is sufficient. This was especially important in the early years of
Linux 2.6 cpuset kernel work, when it was vital for its acceptance to
minimize the kernel footprint of cpusets as much as I was able.
I would tend to favor encouraging further development of the user
level support for such work, over special cases in the kernel calls
just because we had not yet provided the user code to conveniently
make a certain test, but could well enough do so if need be.
Hopefully, in a few months, when I got off this other, non-cpuset
related, task that I'm on, I will get some time to publish the user
level LGPL licensed library code that makes working with bitmasks and
cpusets convenient in user level C code. The code is in excellent
shape, if I do say so myself.
The next step, which I have fond dreams of getting to, but which is so
low on my managers priority list that there is nearly zero chance of
it happening, would be to provide a scriptable interface to the bitmask
code. I'd probably do that as a Python module, integrating it with
Python sets. This would take little more than some routines to import
and export Python sets to the couple of commonly used ascii
representations of bitmasks, which are those formed by the kernels
lib/bitmap.c bitmap_scnprintf() and bitmap_scnlistprintf() routines.
Such work is sufficiently generic that it might even be acceptable as
a patch to the mainline Python release.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401
* Srivatsa Vaddagiri <[email protected]> wrote:
> But I am wondering if attach_task() should leave kernel threads alone
> and act only upon user-space threads. Or maybe allow movement if it
> doesn't result in changing kernel-threads's cpu affinity.
yeah, i'd agree with the latter. We could also special-case the
migration thread to never be migrated itself. (although that's not the
end of the matter either - two ksoftirqds on a cpu are not healty)
Ingo
Ingo, responding to Srivatsa:
> > Or maybe allow movement if it
> > doesn't result in changing kernel-threads's cpu affinity.
>
> yeah, i'd agree ..
Good point. I'd agree too.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401
On Thu, Jun 21, 2007 at 10:07:12AM -0700, Paul Jackson wrote:
> Ingo, responding to Srivatsa:
> > > Or maybe allow movement if it
> > > doesn't result in changing kernel-threads's cpu affinity.
> >
> > yeah, i'd agree ..
>
> Good point. I'd agree too.
Yeah .."allow movement if it doesn't result in changing kernel-threads's cpu
affinity" sounds good, except it is hard to implement in cpuset's
context I think. For ex: we now have to take additional steps when
changing 'cpus_allowed' of a cpuset such that it doesn't violate any cpu
affinity of kernel threads bound to the cpuset. That itself makes the
implementation complex I think.
How about a simpler patch which bans movement of kernel threads from its
home cpuset (i.e top cpuset)?
Index: current/kernel/cpuset.c
===================================================================
--- current.orig/kernel/cpuset.c 2007-06-21 19:42:18.000000000 +0530
+++ current/kernel/cpuset.c 2007-06-21 22:24:38.000000000 +0530
@@ -881,6 +881,10 @@
if (cpus_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))
return -ENOSPC;
+ /* Don't allow kernel threads to be moved */
+ if (!tsk->mm)
+ return -EINVAL;
+
return security_task_setscheduler(tsk, 0, NULL);
}
This probably catches exiting user-space tasks also (whose ->mm pointer is
null). Hmm ..there should be a better check for kernel threads.
--
Regards,
vatsa
> Yeah .."allow movement if it doesn't result in changing kernel-threads's cpu
> affinity" sounds good, except it is hard to implement in cpuset's
> context I think. For ex: we now have to take additional steps when
> changing 'cpus_allowed' of a cpuset such that it doesn't violate any cpu
> affinity of kernel threads bound to the cpuset. That itself makes the
> implementation complex I think.
Yes, that would be more complex than we need. You're right.
The only problem comes with kernel tasks that are pinned to less than
the entire system, and that are in the top cpuset.
They should remain in the top cpuset. Since the top cpuset is never
allowed to have less than all the cpus in the system, such kernel
threads can then be assured of being allowed to use the cpus they need
to use.
Could you try coding something like the following in kernel/cpuset.c,
Srivatsa?
/* Call with task_lock held on tsk */
is_pinned_kernel_thread(tsk)
{
/* "pinned" means constrained to a strict subset of online cpus */
int is_pinned = !cpus_subset(cpus_online_map, tsk->cpus_allowed);
int is_kthread = !tsk->mm || (tsk->flags & PF_BORROWED_MM);
return is_pinned && is_kthread;
}
Then call it from kernel/cpuset.c:attach_task() with:
/* Don't moved pinned kernel threads out of top cpuset */
if (is_pinned_kernel_thread(tsk) && oldcs == &top_cpuset && cs != oldcs) {
task_unlock(tsk);
mutex_unlock(&callback_mutex);
put_task_struct(tsk);
return -EINVAL;
}
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401
On Thu, Jun 21, 2007 at 10:51:52AM -0700, Paul Jackson wrote:
> The only problem comes with kernel tasks that are pinned to less than
> the entire system, and that are in the top cpuset.
That again is not fool-proof. What if kernel-tasks change their cpu affinity
after we have done the is_pinned_kernel_thread() test? Ideally they
should not, but one never knows!
IMHO we simply should not allow kernel threads to move out of top-cpuset
(unless you know of a good reason where we may want to move them).
int cpuset_can_attach()
{
int is_kthread = !tsk->mm || (tsk->flags & PF_BORROWED_MM);
...
/* Don't moved pinned kernel threads out of top cpuset */
if (is_kthread && oldcs == &top_cpuset && cs != oldcs) {
task_unlock(tsk);
mutex_unlock(&callback_mutex);
put_task_struct(tsk);
return -EINVAL;
}
}
What do you think?
--
Regards,
vatsa
Srivatsa wrote:
> That again is not fool-proof. What if kernel-tasks change their cpu affinity
> after we have done the is_pinned_kernel_thread() test? Ideally they
> should not, but one never knows!
>
> IMHO we simply should not allow kernel threads to move out of top-cpuset
Well ... in some theoretical world, perhaps.
In reality, there a big pile of kernel threads that we want to move out
of the root cpuset. And in reality, kernel threads don't change from
being unrestricted (all cpus allowed) to being pinned (to some specific
subset of cpus). Kernel threads that need to be pinned know it up
front; it's an essential part of whatever they do.
I've been working with installed customer configurations for about two
and a half years now that move unpinned kernel threads out of the top
cpuset, as part of keeping portions of the system freed up for dedicated
jobs. I cannot agree to removing this capability.
Nack.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.925.600.0401