Linus,
I have updated the patch a bit and resycned to 2.5.6. Are you
interested? I believe a user interface for setting task CPU affinity is
useful and completes the rest of our sched_* syscalls. A syscall
implementation seems to be what everyone wants (I have a proc-interface,
too...)
This patch implements
int sched_set_affinity(pid_t pid, unsigned int len,
unsigned long *new_mask_ptr);
int sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
unsigned long *user_mask_ptr)
which set and get the cpu affinity (task->cpus_allowed) for a task,
using the set_cpus_allowed function in Ingo's scheduler. The functions
properly support changes to cpus_allowed, implement security, and are
well-tested.
They are based on Ingo's older affinity syscall patch and my older
affinity proc patch.
Comments?
Robert Love
diff -urN linux-2.5.6/arch/i386/kernel/entry.S linux/arch/i386/kernel/entry.S
--- linux-2.5.6/arch/i386/kernel/entry.S Thu Mar 7 21:18:19 2002
+++ linux/arch/i386/kernel/entry.S Sun Mar 10 13:01:03 2002
@@ -717,6 +717,8 @@
.long SYMBOL_NAME(sys_fremovexattr)
.long SYMBOL_NAME(sys_tkill)
.long SYMBOL_NAME(sys_sendfile64)
+ .long SYMBOL_NAME(sys_sched_set_affinity) /* 240 */
+ .long SYMBOL_NAME(sys_sched_get_affinity)
.rept NR_syscalls-(.-sys_call_table)/4
.long SYMBOL_NAME(sys_ni_syscall)
diff -urN linux-2.5.6/include/asm-i386/unistd.h linux/include/asm-i386/unistd.h
--- linux-2.5.6/include/asm-i386/unistd.h Thu Mar 7 21:18:55 2002
+++ linux/include/asm-i386/unistd.h Sun Mar 10 13:03:41 2002
@@ -244,6 +244,8 @@
#define __NR_fremovexattr 237
#define __NR_tkill 238
#define __NR_sendfile64 239
+#define __NR_sched_set_affinity 240
+#define __NR_sched_get_affinity 241
/* user-visible error numbers are in the range -1 - -124: see <asm-i386/errno.h> */
diff -urN linux-2.5.6/kernel/sched.c linux/kernel/sched.c
--- linux-2.5.6/kernel/sched.c Thu Mar 7 21:18:19 2002
+++ linux/kernel/sched.c Sun Mar 10 12:59:26 2002
@@ -1215,6 +1215,95 @@
return retval;
}
+/**
+ * sys_sched_set_affinity - set the cpu affinity of a process
+ * @pid: pid of the process
+ * @len: length of new_mask
+ * @new_mask: user-space pointer to the new cpu mask
+ */
+asmlinkage int sys_sched_set_affinity(pid_t pid, unsigned int len,
+ unsigned long *new_mask_ptr)
+{
+ unsigned long new_mask;
+ task_t *p;
+ int retval;
+
+ if (len < sizeof(new_mask))
+ return -EINVAL;
+
+ if (copy_from_user(&new_mask, new_mask_ptr, sizeof(new_mask)))
+ return -EFAULT;
+
+ new_mask &= cpu_online_map;
+ if (!new_mask)
+ return -EINVAL;
+
+ read_lock(&tasklist_lock);
+
+ retval = -ESRCH;
+ p = find_process_by_pid(pid);
+ if (!p)
+ goto out_unlock;
+
+ retval = -EPERM;
+ if ((current->euid != p->euid) && (current->euid != p->uid) &&
+ !capable(CAP_SYS_NICE))
+ goto out_unlock;
+
+ retval = 0;
+#ifdef CONFIG_SMP
+ set_cpus_allowed(p, new_mask);
+#endif
+
+out_unlock:
+ read_unlock(&tasklist_lock);
+ return retval;
+}
+
+/**
+ * sys_sched_get_affinity - get the cpu affinity of a process
+ * @pid: pid of the process
+ * @user_len_ptr: userspace pointer to the length of the mask
+ * @user_mask_ptr: userspace pointer to the mask
+ */
+asmlinkage int sys_sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
+ unsigned long *user_mask_ptr)
+{
+ unsigned long mask;
+ unsigned int len, user_len;
+ task_t *p;
+ int retval;
+
+ len = sizeof(mask);
+
+ if (copy_from_user(&user_len, user_len_ptr, sizeof(user_len)))
+ return -EFAULT;
+
+ if (copy_to_user(user_len_ptr, &len, sizeof(len)))
+ return -EFAULT;
+
+ if (user_len < len)
+ return -EINVAL;
+
+ read_lock(&tasklist_lock);
+
+ retval = -ESRCH;
+ p = find_process_by_pid(pid);
+ if (!p)
+ goto out_unlock;
+
+ retval = 0;
+ mask = p->cpus_allowed & cpu_online_map;
+
+out_unlock:
+ read_unlock(&tasklist_lock);
+ if (retval)
+ return retval;
+ if (copy_to_user(user_mask_ptr, &mask, sizeof(mask)))
+ return -EFAULT;
+ return 0;
+}
+
asmlinkage long sys_sched_yield(void)
{
runqueue_t *rq;
> Anon! But there is something uber-ugly about constantly jamming more
> and more stuff into procfs without thinking or planning long term... I
> vote for the non-procfs approach :)
At some point I had done a port of SGI's pset/sysmp interface to linux 2.2.
As far as I know, lots of people are still using it. I haven't ported it
to 2.4 for various reasons, but I have to say - IT IS A MUCH BETTER
INTERFACE than all these ad-hoc cpus_allowed bits.
If I thought that it had a chance of inclusion, maybe I'd port it up, but
last I heard none of the "core" people wanted it.
If we are going to pick an affinity system, please, let's consider sysmp().
Tim
Robert Love <[email protected]> writes:
> Linus,
>
> I have updated the patch a bit and resycned to 2.5.6. Are you
> interested? I believe a user interface for setting task CPU affinity is
> useful and completes the rest of our sched_* syscalls. A syscall
> implementation seems to be what everyone wants (I have a proc-interface,
> too...)
Please add the procinterface also! I've found it today (for 2.4.18)
and it's much easier to use with existing programs.
Andreas
> This patch implements
>
> int sched_set_affinity(pid_t pid, unsigned int len,
> unsigned long *new_mask_ptr);
>
> int sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
> unsigned long *user_mask_ptr)
>
> which set and get the cpu affinity (task->cpus_allowed) for a task,
> using the set_cpus_allowed function in Ingo's scheduler. The functions
> properly support changes to cpus_allowed, implement security, and are
> well-tested.
>
> They are based on Ingo's older affinity syscall patch and my older
> affinity proc patch.
>
> Comments?
Please add it for all archs - this is not only interesting for x86,
Andreas
[...]
--
Andreas Jaeger
SuSE Labs [email protected]
private [email protected]
http://www.suse.de/~aj
On Sun, 2002-03-10 at 15:29, Andreas Jaeger wrote:
> Please add the procinterface also! I've found it today (for 2.4.18)
> and it's much easier to use with existing programs.
I agree and I really like the proc-interface. There is something uber
cool about:
cat 1 > /proc/pid/affinity
I have a patch for 2.5.6 for proc-based affinity interface here:
http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/v2.5/cpu-affinity-proc-rml-2.5.6-1.patch
I suspect, however, that despite both patches being small we really only
want to pick and standardize on one. The syscall interface has two main
things going for it against a proc-based implementation: it is faster
and /proc may not be mounted. The masses have spoken on this issue.
Note you can use the syscall interface with existing programs, too.
Just write a program to take in a pid and mask and call
sched_set_affinity.
> Please add it for all archs - this is not only interesting for x86,
I'll send Linus the patch for other arches if/when he accepts this patch
- I have no problem with that.
Robert Love
Robert Love <[email protected]> writes:
> On Sun, 2002-03-10 at 15:29, Andreas Jaeger wrote:
>
>> Please add the procinterface also! I've found it today (for 2.4.18)
>> and it's much easier to use with existing programs.
>
> I agree and I really like the proc-interface. There is something uber
> cool about:
>
> cat 1 > /proc/pid/affinity
I agree.
> I have a patch for 2.5.6 for proc-based affinity interface here:
>
> http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/v2.5/cpu-affinity-proc-rml-2.5.6-1.patch
>
> I suspect, however, that despite both patches being small we really only
> want to pick and standardize on one. The syscall interface has two main
> things going for it against a proc-based implementation: it is faster
> and /proc may not be mounted. The masses have spoken on this issue.
>
> Note you can use the syscall interface with existing programs, too.
> Just write a program to take in a pid and mask and call
> sched_set_affinity.
What I need at the moment is a wrapper - and you can do it two ways:
$ run_with_affinity 1 program arguments...
$ (cat 1 > /proc/self/affinity; program arguments...)
The second one is much easier coded ;-)
>> Please add it for all archs - this is not only interesting for x86,
>
> I'll send Linus the patch for other arches if/when he accepts this patch
> - I have no problem with that.
Thanks,
Andreas
--
Andreas Jaeger
SuSE Labs [email protected]
private [email protected]
http://www.suse.de/~aj
On Sun, Mar 10, 2002 at 01:15:03PM -0500, Robert Love wrote:
I have updated the patch a bit and resycned to 2.5.6. Are you
interested? I believe a user interface for setting task CPU
affinity is useful and completes the rest of our sched_* syscalls.
A syscall implementation seems to be what everyone wants (I have a
proc-interface, too...)
Can't wer just copy the IRIX interface here as some other pathces have
in the past?
--cw
On Sun, 2002-03-10 at 17:05, Chris Wedgwood wrote:
> Can't wer just copy the IRIX interface here as some other pathces have
> in the past?
Is that psets? If so, no thanks.
I want a simple, clean, quick implementation. I have seen patches that
do a lot more than what my simple implementation does, and that really
does not interest me and I suspect Ingo and others feel the same way.
Setting a simple per-task bitmask that is inherited is all we need.
Linux scheduler API is already our own standard. I'd rather support
that (i.e. add another simple sched_* call) than some evil other
interface - but that is just me.
Robert Love
Andreas Jaeger <[email protected]> writes:
|> What I need at the moment is a wrapper - and you can do it two ways:
|>
|> $ run_with_affinity 1 program arguments...
|> $ (cat 1 > /proc/self/affinity; program arguments...)
|>
|> The second one is much easier coded ;-)
Apparently not, since that should be
$ (echo 1 > /proc/self/affinity; program arguments...)
:-)
Andreas.
--
Andreas Schwab, SuSE Labs, [email protected]
SuSE GmbH, Deutschherrnstr. 15-19, D-90429 N?rnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Robert Love wrote:
>
> On Sun, 2002-03-10 at 15:29, Andreas Jaeger wrote:
>
> > Please add the procinterface also! I've found it today (for 2.4.18)
> > and it's much easier to use with existing programs.
>
> I agree and I really like the proc-interface. There is something uber
> cool about:
>
> cat 1 > /proc/pid/affinity
>
> I have a patch for 2.5.6 for proc-based affinity interface here:
>
> http://www.kernel.org/pub/linux/kernel/people/rml/cpu-affinity/v2.5/cpu-affinity-proc-rml-2.5.6-1.patch
Anon! But there is something uber-ugly about constantly jamming more
and more stuff into procfs without thinking or planning long term... I
vote for the non-procfs approach :)
--
Jeff Garzik | Usenet Rule #2 (John Gilmore): "The Net interprets
Building 1024 | censorship as damage and routes around it."
MandrakeSoft |
On Sun, Mar 10, 2002 at 10:03:02PM +0100, Andreas Jaeger wrote:
> >
> > Note you can use the syscall interface with existing programs, too.
> > Just write a program to take in a pid and mask and call
> > sched_set_affinity.
> What I need at the moment is a wrapper - and you can do it two ways:
>
> $ run_with_affinity 1 program arguments...
> $ (cat 1 > /proc/self/affinity; program arguments...)
>
> The second one is much easier coded ;-)
$ (set_affinity 1; program arguments...)
set_affinity just calls sched_set_affinity(getppid()), and everything
is fine (and even shorter to type) :-)
Andreas
--
Andreas Ferber - dev/consulting GmbH - Bielefeld, FRG
---------------------------------------------------------
+49 521 1365800 - [email protected] - http://www.devcon.net
mpadmin(1) mpadmin(1)
NAME
mpadmin - control and report processor status
SYNOPSIS
mpadmin -n
mpadmin -u[processor]
mpadmin -r[processor]
mpadmin -c[processor]
mpadmin -f[processor]
mpadmin -I[processor]
mpadmin -U[processor]
mpadmin -D[processor]
mpadmin -C[processor]
mpadmin -s
DESCRIPTION
mpadmin provides control/information of processor status.
Exactly one argument is accepted by mpadmin at each invocation. The
following arguments are accepted:
-n Report which processors are physically configured. The
numbers of the physically configured processors are written
to the standard output, one processor number per line.
Processors are numbered beginning from 0.
-u[processor]
When no processor is specified, the numbers of the
processors that are available to schedule unrestricted
processes are written to the standard output. Otherwise,
mpadmin enables the processor number processor to run any
unrestricted processes.
-r[processor]
When no processor is specified, the numbers of the
processors that are restricted from running any processes
(except those assigned via the sysmp(MP_MUSTRUN) function,
the runon(1) command, or because of hardware necessity) are
written to the standard output. Otherwise, mpadmin
restricts the processor numbered processor.
-c[processor]
When no processor is specified, the number of the processor
that handles the operating system software clock is written
to the standard output. Otherwise, operating system
software clock handling is moved to the processor numbered
processor. See timers(5) for more details.
-f[processor]
When no processor is specified, the number of the processor
that handles the operating system fast clock is written to
the standard output. Otherwise, operating system fast clock
handling is moved to the processor numbered processor. See
ftimer(1) and timers(5) for a description of the fast clock
usage.
-I[processor]
When no processor is specified, the numbers of the
processors that are isolated are written to the standard
output. Otherwise, mpadmin isolates the processor numbered
processor. An isolated processor is restricted as by the -r
argument. In addition, instruction cache and Translation
Lookaside Buffer synchronization are blocked, and
synchronization is delayed until a system service is
requested.
-U[processor]
When no processor is specified, the numbers of the
processors that are not isolated are written to the standard
output. Otherwise, mpadmin unisolates the processor
numbered processor.
-D[processor]
When no processor is specified, the numbers of the
processors that are not running the clock scheduler are
written to the standard output. Otherwise, mpadmin disables
the clock scheduler on the processor numbered processor.
This makes that processor nonpreemptive, so that normal IRIX
process time slicing is no longer enforced. Processes that
run on a non-preemptive processor are not preempted because
of timer interrupts. They are preempted only when
requesting a system service that causes them to wait, or
that makes a higher-priority process runnable (for example,
posting a semaphore).
-C[processor]
When no processor is specified, the numbers of the
processors that are running the clock scheduler are written
to the standard output. Otherwise, mpadmin enables the
clock scheduler on the processor numbered processor.
Processes on a preemptive processor can be preempted at the
end of their time slice.
-s A summary of the unrestricted, restricted, isolated,
preemptive and clock processor numbers is written to the
standard output.
SEE ALSO
ftimer(1), runon(1), sysmp(2), timers(5).
DIAGNOSTICS
When an argument specifies a processor, 0 is returned on success, -1 on
failure. Otherwise, the number of processors associated with argument is
returned.
WARNINGS
It is not possible to restrict or isolate all processors. Processor 0
must never be restricted or isolated.
BUGS
Changing the clock processor may cause the system to lose a small amount
of system time.
When a processor is not provided as an argument, mpadmin's exit value
will not exceed 255. If more than 255 processors exist, mpadmin will
return 0.
> > If we are going to pick an affinity system, please, let's consider sysmp().
>
> Not too bad. I picked a random sysmp(2) man page off the net (attached
> for ease of other's reference).
so, there are actually two parts to sysmp(). The Way SGI used to it is
with Pset (MP_PSET to sysmp()). They seem to have dropped exported support
for PSets - don't know why. The idea is this.
At boot the system creates a PSet with ALL processors, and one set for each
single CPU. Root can define extra sets with specified CPUs, too.
Processes can then run (commandline tool = 'runon') on a specific Pset.
runon 3 yes # runs on PSET #3
This is ok, but it has several drawbacks:
* user can not run on an arbitrary set of procs
* defining a set for every combination of procs is ludicrous
However, it has several upsides
* disabling a CPU is as simple as removing it from a pset struct, not
iterating over all tasks
* conceptually hides the 'bitmask of CPUs'
> It duplicates some stuff set elsewhere, and seems more than a bit like
> ioctl(2) by another name, but doesn't seem too bad. Note we should be
> careful not to overengineer the interface, either...
At some point Ralf Baechle asked me to extend it more for IRIX
compatibility. We may want to just drop that altogether. Several of the
sysmp() interfaces can be handled at the library layer and re-routed to
their existing interfaces.
> Just setting a bitmask does seem a bit limiting when thinking about the
> future, agreed.
What is the future of the existing CPUs bitmask? Is it becoming something
else?
Perhaps we want to keep sysmp() in name and form, perhaps just in name,
perhaps not at all. This is an area in which I have (had, but could get
again) a lot of interest, but before I waste any more time on it, I'd like
to actually co-design a feature set.
What do we want:
* unpriviliged ability to change current->pset?
- any user can call sysmp(MP_RUNON) anytime
* privileged ability only (runon becomes suid)
- can "trap" processes to a CPU - it has been requested a lot
* processor sets or just bitmasks/lists?
- someone was working on memory sets, similarly to psets
If we really want this, I definately want to help. :)
Tim
On Sun, Mar 10, 2002 at 01:15:03PM -0500, Robert Love wrote:
>
> This patch implements
>
> int sched_set_affinity(pid_t pid, unsigned int len,
> unsigned long *new_mask_ptr);
>
> int sched_get_affinity(pid_t pid, unsigned int *user_len_ptr,
> unsigned long *user_mask_ptr)
>
> which set and get the cpu affinity (task->cpus_allowed) for a task,
> using the set_cpus_allowed function in Ingo's scheduler. The functions
> properly support changes to cpus_allowed, implement security, and are
> well-tested.
Setting the affinity of a whole process group also makes sense IMHO.
Therefore I think an interface more like the setpriority syscall
for sched_set_affinity (with two parameters which/who instead of a
single PID) would be more flexible, eg.
int sched_set_affinity(int which, int who, unsigned int len,
unsigned long *new_mask_ptr);
with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
to the value of who.
Getting the mask of a group of processes doesn't make sense though
(what if they differ?), so the current interface of sched_get_affinity
is just fine IMHO.
Andreas
--
Andreas Ferber - dev/consulting GmbH - Bielefeld, FRG
---------------------------------------------------------
+49 521 1365800 - [email protected] - http://www.devcon.net
Picking nits, but....
Andreas Ferber wrote:
> Setting the affinity of a whole process group also makes sense IMHO.
> Therefore I think an interface more like the setpriority syscall
> for sched_set_affinity (with two parameters which/who instead of a
> single PID) would be more flexible, eg.
>
> int sched_set_affinity(int which, int who, unsigned int len,
> unsigned long *new_mask_ptr);
>
> with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
> to the value of who.
I soule suggest that the order be
int sched_set_affinity(int who, int which, unsigned int len,
unsigned long *new_mask_ptr);
This would have the {p,pg}id be the first thing that a programmer
would see (likely more important than the 'which'.).
--
Stephen Samuel +1(604)876-0426 [email protected]
http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.
On Fri, Mar 15, 2002 at 02:06:04PM -0800, Stephen Samuel wrote:
> >
> > int sched_set_affinity(int which, int who, unsigned int len,
> > unsigned long *new_mask_ptr);
> >
> > with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
> > to the value of who.
Uh, who/which should be just the other way round in the description
(but not in the prototype). Sorry.
> I soule suggest that the order be
>
> int sched_set_affinity(int who, int which, unsigned int len,
> unsigned long *new_mask_ptr);
>
> This would have the {p,pg}id be the first thing that a programmer
> would see (likely more important than the 'which'.).
See my correction above, does that address your concern?
Andreas
--
Andreas Ferber - dev/consulting GmbH - Bielefeld, FRG
---------------------------------------------------------
+49 521 1365800 - [email protected] - http://www.devcon.net
Almost... Same effect (mostly)...
It does, however, leaves us arguing the linguistic semantics of
which name 'who' should have. It seems to me that the most
natural would be with 'who' being the 'name' of the target, and
'which' specifying which name space 'who' is operating in.
UGH: messing with these names via pronouns is too confusing:
-----------
How about this:
int sched_set_affinity(int who, int which, unsigned int len,
unsigned long *new_mask_ptr);
'who' being a {process, process-group or user } ID , and
with 'which' being one of {PRIO_PROCESS, PRIO_PGRP, PRIO_USER},
respectively -- specifying which namespace 'who' operates in.
I think that that is what you were trying to say, right?
Andreas Ferber wrote:
> On Fri, Mar 15, 2002 at 02:06:04PM -0800, Stephen Samuel wrote:
>
>> >
>> > int sched_set_affinity(int which, int who, unsigned int len,
>> > unsigned long *new_mask_ptr);
>> >
>> > with who one of {PRIO_PROCESS,PRIO_PGRP,PRIO_USER} and which according
>> > to the value of who.
>>
>
> Uh, who/which should be just the other way round in the description
> (but not in the prototype). Sorry.
>
>
>>I sould suggest that the order be
>>
>>int sched_set_affinity(int who, int which, unsigned int len,
>> unsigned long *new_mask_ptr);
>>
>>This would have the {p,pg}id be the first thing that a programmer
>>would see (likely more important than the 'which'.).
--
Stephen Samuel +1(604)876-0426 [email protected]
http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.