LinuxLists.cc - question about cpusets vs sched

2015-12-10 21:38:51

Subject: question about cpusets vs sched_setaffinity()

Hi,

I've got a question about the interaction between cpusets and sched_setaffinity().

If I put a task into a cpuset and then call sched_setaffinity() on it, it will
be affined to the intersection of the two sets of cpus. (Those specified on the
set, and those specified in the syscall.)

However, if I then change the cpus in the cpuset the process affinity will
simply be overwritten by the new cpuset affinity. It does not seem to take into
account any restrictions from the original sched_setaffinity() call.

Wouldn't it make more sense to affine the process to the intersection between
the new set of cpus from the cpuset, and the current process affinity? That way
if I explicitly masked out certain CPUs in the original sched_setaffinity() call
then they would remain masked out regardless of changes to the set of cpus
assigned to the cpuset.

Thanks,
Chris

PS: Not subscribed to the list, please CC me on replies.

2015-12-11 22:15:26

by Jason Baron

[permalink] [raw]

Subject: Re: question about cpusets vs sched_setaffinity()

On 12/10/2015 04:30 PM, Chris Friesen wrote:
> Hi,
>
> I've got a question about the interaction between cpusets and
> sched_setaffinity().
>
> If I put a task into a cpuset and then call sched_setaffinity() on it,
> it will be affined to the intersection of the two sets of cpus. (Those
> specified on the set, and those specified in the syscall.)
>
> However, if I then change the cpus in the cpuset the process affinity
> will simply be overwritten by the new cpuset affinity. It does not seem
> to take into account any restrictions from the original
> sched_setaffinity() call.
>
> Wouldn't it make more sense to affine the process to the intersection
> between the new set of cpus from the cpuset, and the current process
> affinity? That way if I explicitly masked out certain CPUs in the
> original sched_setaffinity() call then they would remain masked out
> regardless of changes to the set of cpus assigned to the cpuset.
>
> Thanks,
> Chris
>
> PS: Not subscribed to the list, please CC me on replies.

Hi,

This behavior seems a bit odd to me as well - if you've done a
sched_setaffinity() call to a subset of the cpus of a cpuset that the
task in contained within, any change to the cpuset cpus will wipe away
the sched_setaffinity() settings even if they continue to be a subset of
the cpuset cpus.

To add the behavior you are describing, I think requires another
cpumask_t field in the task_struct. Where we could store the last
requested mask value for sched_setaffinity() and use that when updating
the cpus for a cpuset via an intersection as you described. I think
adding a task to a cpuset still should wipe out any sched_setaffinity()
settings - but that would depend on the desired semantics here. It would
also require a knob so as not to break existing behavior by default.

You could also create a child cgroup for the process that you don't want
to change and set the cpus on that cgroup instead of using
sched_setaffinity(). Then you change the cpus for the parent cgroup and
that shouldn't affect the child as long as the child cgroup is a subset.
But its not entirely clear to me if that addresses your use-case?

Thanks,

-Jason

2015-12-11 23:35:02

by Chris Friesen

[permalink] [raw]

Subject: Re: question about cpusets vs sched_setaffinity()

On 12/11/2015 04:15 PM, Jason Baron wrote:
> On 12/10/2015 04:30 PM, Chris Friesen wrote:

>> If I put a task into a cpuset and then call sched_setaffinity() on it,
>> it will be affined to the intersection of the two sets of cpus. (Those
>> specified on the set, and those specified in the syscall.)
>>
>> However, if I then change the cpus in the cpuset the process affinity
>> will simply be overwritten by the new cpuset affinity. It does not seem
>> to take into account any restrictions from the original
>> sched_setaffinity() call.
>>
>> Wouldn't it make more sense to affine the process to the intersection
>> between the new set of cpus from the cpuset, and the current process
>> affinity? That way if I explicitly masked out certain CPUs in the
>> original sched_setaffinity() call then they would remain masked out
>> regardless of changes to the set of cpus assigned to the cpuset.

<snip>

> To add the behavior you are describing, I think requires another
> cpumask_t field in the task_struct. Where we could store the last
> requested mask value for sched_setaffinity() and use that when updating
> the cpus for a cpuset via an intersection as you described. I think
> adding a task to a cpuset still should wipe out any sched_setaffinity()
> settings - but that would depend on the desired semantics here. It would
> also require a knob so as not to break existing behavior by default.

Agreed, the additional field in the task_struct makes sense. Personally I don't
think that adding a task to a cpuset should wipe out any previously-set
affinity, I think it should take the intersection for that case as well.

In this environment it might make sense to have separate queries to return the
requested and actual affinity.

> You could also create a child cgroup for the process that you don't want
> to change and set the cpus on that cgroup instead of using
> sched_setaffinity(). Then you change the cpus for the parent cgroup and
> that shouldn't affect the child as long as the child cgroup is a subset.
> But its not entirely clear to me if that addresses your use-case?

I ended up doing something like this where I had a top-level cpuset and a number
of child cpusets, each with an exclusive subset of the CPUs assigned to it. But
it meant that I needed more complicated code to figure out which tasks needed to
go into which child cpusets, and more complicated code to handle removing a CPU
from the top-level cpuset (since you have to remove it from any children first).

Chris

2015-12-14 22:14:32

by Jason Baron

[permalink] [raw]

Subject: Re: question about cpusets vs sched_setaffinity()

On 12/11/2015 06:26 PM, Chris Friesen wrote:
> On 12/11/2015 04:15 PM, Jason Baron wrote:
>> On 12/10/2015 04:30 PM, Chris Friesen wrote:
>
>>> If I put a task into a cpuset and then call sched_setaffinity() on it,
>>> it will be affined to the intersection of the two sets of cpus. (Those
>>> specified on the set, and those specified in the syscall.)
>>>
>>> However, if I then change the cpus in the cpuset the process affinity
>>> will simply be overwritten by the new cpuset affinity. It does not seem
>>> to take into account any restrictions from the original
>>> sched_setaffinity() call.
>>>
>>> Wouldn't it make more sense to affine the process to the intersection
>>> between the new set of cpus from the cpuset, and the current process
>>> affinity? That way if I explicitly masked out certain CPUs in the
>>> original sched_setaffinity() call then they would remain masked out
>>> regardless of changes to the set of cpus assigned to the cpuset.
>
> <snip>
>
>> To add the behavior you are describing, I think requires another
>> cpumask_t field in the task_struct. Where we could store the last
>> requested mask value for sched_setaffinity() and use that when updating
>> the cpus for a cpuset via an intersection as you described. I think
>> adding a task to a cpuset still should wipe out any sched_setaffinity()
>> settings - but that would depend on the desired semantics here. It would
>> also require a knob so as not to break existing behavior by default.
>
> Agreed, the additional field in the task_struct makes sense. Personally
> I don't think that adding a task to a cpuset should wipe out any
> previously-set affinity, I think it should take the intersection for
> that case as well.
>
> In this environment it might make sense to have separate queries to
> return the requested and actual affinity.
>

So because cpumask_t is dimensioned by NR_CPUS, I think we would need a
pointer to the cpumask_t field. And we could allocate it when we want
the cpus set by sched_setaffinity() to persist across the cgroup cpuset
cpu changes. I think you are right that a flag to
sched_[set|get]affinity() for this case might be nice - but that would
require a new syscall...

>> You could also create a child cgroup for the process that you don't want
>> to change and set the cpus on that cgroup instead of using
>> sched_setaffinity(). Then you change the cpus for the parent cgroup and
>> that shouldn't affect the child as long as the child cgroup is a subset.
>> But its not entirely clear to me if that addresses your use-case?
>
> I ended up doing something like this where I had a top-level cpuset and
> a number of child cpusets, each with an exclusive subset of the CPUs
> assigned to it. But it meant that I needed more complicated code to
> figure out which tasks needed to go into which child cpusets, and more
> complicated code to handle removing a CPU from the top-level cpuset
> (since you have to remove it from any children first).
>
> Chris

I agree that it would be nice to improve this interface, since you are
creating extra cgroups here just to sort of work around this.

Thanks,

-Jason