2008-06-18 14:12:39

by Daniel K.

[permalink] [raw]
Subject: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us

mkdir /dev/cgroup
mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup

mkdir /dev/cgroup/0

echo 3 > /dev/cgroup/0/cpuset.cpus
echo 0 > /dev/cgroup/0/cpuset.mems
echo 100000 > /dev/cgroup/0/cpu.rt_period_us
echo 5000 > /dev/cgroup/0/cpu.rt_runtime_us

schedtool -R -p 1 -e burnP6 &
[1] 3309
echo 3309 > /dev/cgroup/0/tasks

At this point I'd expect the burnP6 task to use 5% of the available CPU
resources in the cgroup (5000/100000), but the real CPU usage, as
reported by top, is 20% This is 4 times the expected result, and as I
have 4 cores, I think there is a strong hint of correlation there.

Maybe with a 4 core system there really is 4 000 000 us available for
every 1 wall-time second?

However, I have only assigned one core (3) to _this_ cgroup, so I think
this cgroup is overusing its assigned resources.

What do you think?


Daniel K.


2008-06-18 14:37:30

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us

On Wed, 2008-06-18 at 16:12 +0200, Daniel K. wrote:
> mkdir /dev/cgroup
> mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup
>
> mkdir /dev/cgroup/0
>
> echo 3 > /dev/cgroup/0/cpuset.cpus
> echo 0 > /dev/cgroup/0/cpuset.mems
> echo 100000 > /dev/cgroup/0/cpu.rt_period_us
> echo 5000 > /dev/cgroup/0/cpu.rt_runtime_us
>
> schedtool -R -p 1 -e burnP6 &
> [1] 3309
> echo 3309 > /dev/cgroup/0/tasks
>
> At this point I'd expect the burnP6 task to use 5% of the available CPU
> resources in the cgroup (5000/100000), but the real CPU usage, as
> reported by top, is 20% This is 4 times the expected result, and as I
> have 4 cores, I think there is a strong hint of correlation there.
>
> Maybe with a 4 core system there really is 4 000 000 us available for
> every 1 wall-time second?

Indeed. In effect each cpu (see below on specifics) gets the
runtime/period you specify, and it moves unused runtime between cpus.

> However, I have only assigned one core (3) to _this_ cgroup, so I think
> this cgroup is overusing its assigned resources.
>
> What do you think?

I think you're on to something :-)

It uses root domains, that is the largest domain this cpu is part of
that has load-balancing enabled.

So while you have made your process part of the cgroup and the cpuset,
there is no strong relation between them, that is to say, I could either
mount the cpuset or cpu controller on a different mount point and add
tasks to one but not the other.

So the relation I used is that of load-balance domains.

So in order to get what you intended, do something like:


mount none /dev/cpuset cgroup -o cpuset
mount none /cgroup/cpu cgroup -o cpu

mkdir /dev/cpuset/root
mkdir /dev/cpuset/rt

#
# this might not actually make the kernel happy
# as it might attempt (and possibly succeed in)
# moving cpu bound kernel threads
#
for i in `cat /dev/cpuset/tasks`; do
echo $i > /dev/cpuset/root/tasks;
done

echo 0-2 > /dev/cpuset/root/cpuset.cpus
echo 3 > /dev/cpuset/rt/cpuset.cpus

echo 0 > /dev/cpuset/cpuset.sched_load_balance

mkdir /cgroup/cpu/foo
echo 100000 > /cgroup/cpu/foo/cpu.rt_period_us
echo 5000 > /cgroup/cpu/foo/cpu.rt_runtime_us

echo $$ > /dev/cpuset/rt/tasks
echo $$ > /cgroup/cpu/foo/tasks

chrt -r -p 1 burnP6 &



2008-06-24 06:15:00

by Max Krasnyansky

[permalink] [raw]
Subject: Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us

Peter Zijlstra wrote:
> On Wed, 2008-06-18 at 16:12 +0200, Daniel K. wrote:
>> mkdir /dev/cgroup
>> mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup
>>
>> mkdir /dev/cgroup/0
>>
>> echo 3 > /dev/cgroup/0/cpuset.cpus
>> echo 0 > /dev/cgroup/0/cpuset.mems
>> echo 100000 > /dev/cgroup/0/cpu.rt_period_us
>> echo 5000 > /dev/cgroup/0/cpu.rt_runtime_us
>>
>> schedtool -R -p 1 -e burnP6 &
>> [1] 3309
>> echo 3309 > /dev/cgroup/0/tasks
>>
>> At this point I'd expect the burnP6 task to use 5% of the available CPU
>> resources in the cgroup (5000/100000), but the real CPU usage, as
>> reported by top, is 20% This is 4 times the expected result, and as I
>> have 4 cores, I think there is a strong hint of correlation there.
>>
>> Maybe with a 4 core system there really is 4 000 000 us available for
>> every 1 wall-time second?
>
> Indeed. In effect each cpu (see below on specifics) gets the
> runtime/period you specify, and it moves unused runtime between cpus.
>
>> However, I have only assigned one core (3) to _this_ cgroup, so I think
>> this cgroup is overusing its assigned resources.
>>
>> What do you think?
>
> I think you're on to something :-)
>
> It uses root domains, that is the largest domain this cpu is part of
> that has load-balancing enabled.
>
> So while you have made your process part of the cgroup and the cpuset,
> there is no strong relation between them, that is to say, I could either
> mount the cpuset or cpu controller on a different mount point and add
> tasks to one but not the other.
Daniel is probably really confused by now :).

> So the relation I used is that of load-balance domains.
That's the key thing.

> So in order to get what you intended, do something like:
>
> mount none /dev/cpuset cgroup -o cpuset
> mount none /cgroup/cpu cgroup -o cpu
>
> mkdir /dev/cpuset/root
> mkdir /dev/cpuset/rt
>
> #
> # this might not actually make the kernel happy
> # as it might attempt (and possibly succeed in)
> # moving cpu bound kernel threads
> #
> for i in `cat /dev/cpuset/tasks`; do
> echo $i > /dev/cpuset/root/tasks;
> done
It won't let you add tasks before adding cpus.

> echo 0-2 > /dev/cpuset/root/cpuset.cpus
> echo 3 > /dev/cpuset/rt/cpuset.cpus
>
> echo 0 > /dev/cpuset/cpuset.sched_load_balance
>
> mkdir /cgroup/cpu/foo
> echo 100000 > /cgroup/cpu/foo/cpu.rt_period_us
> echo 5000 > /cgroup/cpu/foo/cpu.rt_runtime_us
>
> echo $$ > /dev/cpuset/rt/tasks
> echo $$ > /cgroup/cpu/foo/tasks
>
> chrt -r -p 1 burnP6 &

That seems too complicated :). There is no need to mount them separately. The
only part that was missing from Daniel's example is the sched_load_balance
thingy otherwise he can still have a single cgroup unless I missing something.
In other words:

mkdir /dev/cgroup
mount -t cgroup -o cpu,cpuset cgroup /dev/cgroup

# Setup first domain (cpu 0-2)
mkdir /dev/cgroup/0
echo 0-2 > /dev/cgroup/0/cpuset.cpus
echo 0 > /dev/cgroup/0/cpuset.mems

# Setup second domain (cpu 3)
mkdir /dev/cgroup/1
echo 3 > /dev/cgroup/1/cpuset.cpus
echo 0 > /dev/cgroup/1/cpuset.mems

# Do not balance between domains
echo 0 > /dev/cpuset/cpuset.sched_load_balance

# Move all tasks into first domain if needed
...

# Setup RT bandwidth for second domain
echo 100000 > /dev/cgroup/1/cpu.rt_period_us
echo 5000 > /dev/cgroup/1/cpu.rt_runtime_us

schedtool -R -p 1 -e burnP6 &
[1] 3309
echo 3309 > /dev/cgroup/1/tasks

Max

2008-06-24 09:53:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us

On Mon, 2008-06-23 at 23:14 -0700, Max Krasnyansky wrote:

> That seems too complicated :). There is no need to mount them separately.

Mounting them independently gives you much greater flexibility.

You quickly run into overlapping cpu sets if you want multiple groups on
the isolated part.


2008-06-24 16:50:56

by Max Krasnyansky

[permalink] [raw]
Subject: Re: RT-Scheduler/cgroups: Possible overuse of resources assigned via cpu.rt_period_us and cpu.rt_runtime_us

Peter Zijlstra wrote:
> On Mon, 2008-06-23 at 23:14 -0700, Max Krasnyansky wrote:
>
>> That seems too complicated :). There is no need to mount them separately.
>
> Mounting them independently gives you much greater flexibility.
>
> You quickly run into overlapping cpu sets if you want multiple groups on
> the isolated part.

Sure, it is more flexible but for simple cases it's more confusing.

Max