2011-05-07 06:34:41

by Paul Turner

[permalink] [raw]
Subject: [patch 00/15] CFS Bandwidth Control V6

[ Apologies if you're receiving this twice, the previous mailing did not seem
to make it to the list for some reason ].

Hi all,

Please find attached the latest iteration of bandwidth control (v6).

Where the previous release cleaned up many of the semantics surrounding the
update_curr() path and throttling, this release is focused on cleaning up the
patchset itself. Elements such as the notion of expiring bandwidth from
previous quota periods as well as some of the core accounting changes have
been pushed up (and re-written for clarity) within the patchset reducing the
patch-to-patch churn significantly.

While this restructuring was fairly extensive in terms of the code touched,
there are no major behavioral changes beyond bug fixes.

Thanks to Hidetoshi Seto for identifying the throttle list corruption.

Notable changes:
- Runtime is now actively expired taking advantage of the bounds placed on
sched_clock syncrhonization.
- distribute_cfs_runtime() no longer races with throttles around the period
boundary.
- Major code cleanup

Bug fixes:
- several interactions with active load-balance have been corrected. This was
manifesting previously in throttle_list corruption and crashes.

Interface:
----------
Three new cgroupfs files are exported by the cpu subsystem:
cpu.cfs_period_us : period over which bandwidth is to be regulated
cpu.cfs_quota_us : bandwidth available for consumption per period
cpu.stat : statistics (such as number of throttled periods and
total throttled time)
One important interface change that this introduces (versus the rate limits
proposal) is that the defined bandwidth becomes an absolute quantifier.

Previous postings:
-----------------
v5:
https://lkml.org/lkml/2011/3/22/477
v4:
https://lkml.org/lkml/2011/2/23/44
v3:
https://lkml.org/lkml/2010/10/12/44
v2:
http://lkml.org/lkml/2010/4/28/88
Original posting:
http://lkml.org/lkml/2010/2/12/393

Prior approaches:
http://lkml.org/lkml/2010/1/5/44 ["CFS Hard limits v5"]

Thanks,

- Paul


2011-06-07 15:46:01

by Kamalesh Babulal

[permalink] [raw]
Subject: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

Hi All,

In our test environment, while testing the CFS Bandwidth V6 patch set
on top of 55922c9d1b84. We observed that the CPU's idle time is seen
between 30% to 40% while running CPU bound test, with the cgroups tasks
not pinned to the CPU's. Whereas in the inverse case, where the cgroups
tasks are pinned to the CPU's, the idle time seen is nearly zero.

Test Scenario
--------------
- 5 cgroups are created with each groups assigned 2, 2, 4, 8, 16 tasks respectively.
- Each of the cgroup, has N sub-cgroups created. Where N is the NR_TASKS the cgroup
is assigned with. i.e., cgroup1, will create two sub-cgroups under it and assigned
one tasks per sub-group.
------------
| cgroup 1 |
------------
/ \
/ \
-------------- --------------
|sub-cgroup 1| |sub-cgroup 2|
| (task 1) | | (task 2) |
-------------- --------------

- Top cgroup is given unlimited quota (cpu.cfs_quota_us = -1) and period of 500ms
(cpu.cfs_period_us = 500000). Whereas the sub-cgroups are given 250ms of quota
(cpu.cfs_quota_us = 250000) and period of 500ms. i.e. the top cgroups are given
unlimited bandwidth, whereas the sub-group are throttled every 250ms.

- Additional if required the proportional CPU shares can be assigned to cpu.shares
as NR_TASKS * 1024. i.e. cgroup1 has 2 tasks * 1024 = 2048 worth cpu.shares
for cgroup1. (In the below test results published all cgroups and sub-cgroups
are given the equal share of 1024).

- One CPU bound while(1) task is attached to each sub-cgroup.

- sum-exec time for each cgroup/sub-cgroup is captured from /proc/sched_debug after
60 seconds and analyzed for the run time of the tasks a.k.a sub-cgroup.

How is the idle CPU time measured ?
------------------------------------
- vmstat stats are logged every 2 seconds, after attaching the last while1 task
to 16th sub-cgroup of cgroup 5 till the 60 sec run is over. After the run idle%
of a CPU is calculated by summing idle column from the vmstat log and dividing it
by number of samples collected, of-course after neglecting the first record
from the log.

How are the tasks pinned to the CPU ?
-------------------------------------
- cgroup is mounted with cpuset,cpu controller and for every 2 sub-cgroups one
physical CPU is allocated. i.e. CPU 1 is allocated between 1/1 and 1/2 (Group 1,
sub-cgroup 1 and sub-cgroup 2). Similarly CPUs 7 to 15 are allocated to 15/1 to
15/16 (Group 15, subgroup 1 to 16). Note that test machine used to test has
16 CPUs.

Result for non-pining case
---------------------------
Only the hierarchy is created as stated above and cpusets are not assigned per cgroup.

Average CPU Idle percentage 34.8% (as explained above in the Idle time measured)
Bandwidth shared with remaining non-Idle 65.2%

* Note: For the sake of roundoff value the numbers are multiplied by 100.

In the below result for cgroup1 9.2500 corresponds to sum-exec time captured
from /proc/sched_debug for cgroup 1 tasks (including sub-cgroup 1 and 2).
Which is in-turn 6% of the non-Idle CPU time (which is derived by 9.2500 * 65.2 / 100 )

Bandwidth of Group 1 = 9.2500 i.e = 6.0300% of non-Idle CPU time 65.2%
|...... subgroup 1/1 = 48.7800 i.e = 2.9400% of 6.0300% Groups non-Idle CPU time
|...... subgroup 1/2 = 51.2100 i.e = 3.0800% of 6.0300% Groups non-Idle CPU time


Bandwidth of Group 2 = 9.0400 i.e = 5.8900% of non-Idle CPU time 65.2%
|...... subgroup 2/1 = 51.0200 i.e = 3.0000% of 5.8900% Groups non-Idle CPU time
|...... subgroup 2/2 = 48.9700 i.e = 2.8800% of 5.8900% Groups non-Idle CPU time


Bandwidth of Group 3 = 16.9300 i.e = 11.0300% of non-Idle CPU time 65.2%
|...... subgroup 3/1 = 26.0300 i.e = 2.8700% of 11.0300% Groups non-Idle CPU time
|...... subgroup 3/2 = 25.8800 i.e = 2.8500% of 11.0300% Groups non-Idle CPU time
|...... subgroup 3/3 = 22.7800 i.e = 2.5100% of 11.0300% Groups non-Idle CPU time
|...... subgroup 3/4 = 25.2900 i.e = 2.7800% of 11.0300% Groups non-Idle CPU time


Bandwidth of Group 4 = 27.9300 i.e = 18.2100% of non-Idle CPU time 65.2%
|...... subgroup 4/1 = 16.6000 i.e = 3.0200% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/2 = 8.0000 i.e = 1.4500% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/3 = 9.0000 i.e = 1.6300% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/4 = 7.9600 i.e = 1.4400% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.3500 i.e = 2.2400% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/6 = 16.2500 i.e = 2.9500% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.6100 i.e = 2.2900% of 18.2100% Groups non-Idle CPU time
|...... subgroup 4/8 = 17.1900 i.e = 3.1300% of 18.2100% Groups non-Idle CPU time


Bandwidth of Group 5 = 36.8300 i.e = 24.0100% of non-Idle CPU time 65.2%
|...... subgroup 5/1 = 56.6900 i.e = 13.6100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/2 = 8.8600 i.e = 2.1200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/3 = 5.5100 i.e = 1.3200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/4 = 4.5700 i.e = 1.0900% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/5 = 7.9500 i.e = 1.9000% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/6 = 2.1600 i.e = .5100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/7 = 2.3400 i.e = .5600% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/8 = 2.1500 i.e = .5100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/9 = 9.7200 i.e = 2.3300% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/10 = 5.0600 i.e = 1.2100% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/11 = 4.6900 i.e = 1.1200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/12 = 8.9700 i.e = 2.1500% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/13 = 8.4600 i.e = 2.0300% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/14 = 11.8400 i.e = 2.8400% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.3400 i.e = 1.5200% of 24.0100% Groups non-Idle CPU time
|...... subgroup 5/16 = 5.1500 i.e = 1.2300% of 24.0100% Groups non-Idle CPU time

Pinned case
--------------
CPU hierarchy is created and cpusets are allocated.

Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%

Bandwidth of Group 1 = 6.3400 i.e = 6.3400% of non-Idle CPU time 100%
|...... subgroup 1/1 = 50.0400 i.e = 3.1700% of 6.3400% Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9500 i.e = 3.1600% of 6.3400% Groups non-Idle CPU time


Bandwidth of Group 2 = 6.3200 i.e = 6.3200% of non-Idle CPU time 100%
|...... subgroup 2/1 = 50.0400 i.e = 3.1600% of 6.3200% Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9500 i.e = 3.1500% of 6.3200% Groups non-Idle CPU time


Bandwidth of Group 3 = 12.6300 i.e = 12.6300% of non-Idle CPU time 100%
|...... subgroup 3/1 = 25.0300 i.e = 3.1600% of 12.6300% Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0100 i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0000 i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9400 i.e = 3.1400% of 12.6300% Groups non-Idle CPU time


Bandwidth of Group 4 = 25.1000 i.e = 25.1000% of non-Idle CPU time 100%
|...... subgroup 4/1 = 12.5400 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5100 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5300 i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.4900 i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4700 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4700 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4500 i.e = 3.1200% of 25.1000% Groups non-Idle CPU time


Bandwidth of Group 5 = 49.5700 i.e = 49.5700% of non-Idle CPU time 100%
|...... subgroup 5/1 = 49.8500 i.e = 24.7100% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2900 i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2800 i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2700 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2700 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2600 i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2500 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2400 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2300 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2300 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2200 i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2100 i.e = 3.0700% of 49.5700% Groups non-Idle CPU time

with equal cpu shares allocated to all the groups/sub-cgroups and CFS bandwidth configured
to allow 100% CPU utilization. We see the CPU idle time in the un-pinned case.

Benchmark used to reproduce the issue, is attached. Justing executing the script should
report similar numbers.

#!/bin/bash

NR_TASKS1=2
NR_TASKS2=2
NR_TASKS3=4
NR_TASKS4=8
NR_TASKS5=16

BANDWIDTH=1
SUBGROUP=1
PRO_SHARES=0
MOUNT=/cgroup/
LOAD=/root/while1

usage()
{
echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
echo "-s Create sub-groups for every task (default creates sub-group)"
echo "-p create propotional shares based on cpus"
exit
}
while getopts ":b:s:p:" arg
do
case $arg in
b)
BANDWIDTH=$OPTARG
shift
if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
then
usage
fi
;;
s)
SUBGROUP=$OPTARG
shift
if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
then
usage
fi
;;
p)
PRO_SHARES=$OPTARG
shift
if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
then
usage
fi
;;

*)

esac
done
if [ ! -d $MOUNT ]
then
mkdir -p $MOUNT
fi
test()
{
echo -n "[ "
if [ $1 -eq 0 ]
then
echo -ne '\E[42;40mOk'
else
echo -ne '\E[31;40mFailed'
tput sgr0
echo " ]"
exit
fi
tput sgr0
echo " ]"
}
mount_cgrp()
{
echo -n "Mounting root cgroup "
mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT &> /dev/null
test $?
}

umount_cgrp()
{
echo -n "Unmounting root cgroup "
cd /root/
umount $MOUNT
test $?
}

create_hierarchy()
{
mount_cgrp
cpuset_mem=`cat $MOUNT/cpuset.mems`
cpuset_cpu=`cat $MOUNT/cpuset.cpus`
echo -n "creating groups/sub-groups ..."
for (( i=1; i<=5; i++ ))
do
mkdir $MOUNT/$i
echo $cpuset_mem > $MOUNT/$i/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
echo -n ".."
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
mkdir -p $MOUNT/$i/$j
echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
echo -n ".."
done
fi
done
echo "."
}

cleanup()
{
pkill -9 while1 &> /dev/null
sleep 10
echo -n "Umount groups/sub-groups .."
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
rmdir $MOUNT/$i/$j
echo -n ".."
done
fi
rmdir $MOUNT/$i
echo -n ".."
done
echo " "
umount_cgrp
}

load_tasks()
{
for (( i=1; i<=5; i++ ))
do
jj=$(eval echo "\$NR_TASKS$i")
shares="1024"
if [ $PRO_SHARES -eq 1 ]
then
eval shares=$(echo "$jj * 1024" | bc)
fi
echo $hares > $MOUNT/$i/cpu.shares
for (( j=1; j<=$jj; j++ ))
do
echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
if [ $SUBGROUP -eq 1 ]
then

$LOAD &
echo $! > $MOUNT/$i/$j/tasks
echo "1024" > $MOUNT/$i/$j/cpu.shares

if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
fi
else
$LOAD &
echo $! > $MOUNT/$i/tasks
echo $shares > $MOUNT/$i/cpu.shares

if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
fi
fi
done
done
echo "Captuing idle cpu time with vmstat...."
vmstat 2 100 &> vmstat_log &
}

pin_tasks()
{
cpu=0
count=1
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
if [ $count -gt 2 ]
then
cpu=$((cpu+1))
count=1
fi
echo $cpu > $MOUNT/$i/$j/cpuset.cpus
count=$((count+1))
done
else
case $i in
1)
echo 0 > $MOUNT/$i/cpuset.cpus;;
2)
echo 1 > $MOUNT/$i/cpuset.cpus;;
3)
echo "2-3" > $MOUNT/$i/cpuset.cpus;;
4)
echo "4-6" > $MOUNT/$i/cpuset.cpus;;
5)
echo "7-15" > $MOUNT/$i/cpuset.cpus;;
esac
fi
done

}

print_results()
{
eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
for (( i=1; i<=5; i++ ))
do
eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
echo -n "|"
echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
done
fi
echo " "
echo " "
done
}
capture_results()
{
cat /proc/sched_debug > sched_log
pkill -9 vmstat -c
avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/NR)}')

rem=$(echo "scale=2; 100 - $avg" |bc)
echo "Average CPU Idle percentage $avg%"
echo "Bandwidth shared with remaining non-Idle $rem%"
for (( i=1; i<=5; i++ ))
do
cat sched_log |grep -i while1|grep -i " \/$i" > sched_log_$i
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
cat sched_log |grep -i while1|grep -i " \/$i\/$j" > sched_log_$i-$j
done
fi
done
print_results $rem
}
create_hierarchy
pin_tasks

load_tasks
sleep 60
capture_results
cleanup
exit

Thanks,
Kamalesh.

2011-06-08 03:09:44

by Paul Turner

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

[ Sorry for the delayed response, I was out on vacation for the second
half of May until last week -- I've now caught up on email and am
preparing the next posting ]

Thanks for the test-case Kamalesh -- my immediate suspicion is quota
return may not be fine-grained enough (although the numbers provided
are large enough it's possible there's also just a bug).

I have some tools from my own testing I can use to pull this apart,
let me run your work-load and get back to you.

On Tue, Jun 7, 2011 at 8:45 AM, Kamalesh Babulal
<[email protected]> wrote:
> Hi All,
>
> ? ?In our test environment, while testing the CFS Bandwidth V6 patch set
> on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> between 30% to 40% while running CPU bound test, with the cgroups tasks
> not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> tasks are pinned to the CPU's, the idle time seen is nearly zero.
>
> Test Scenario
> --------------
> - 5 cgroups are created with each groups assigned 2, 2, 4, 8, 16 tasks respectively.
> - Each of the cgroup, has N sub-cgroups created. Where N is the NR_TASKS the cgroup
> ?is assigned with. i.e., cgroup1, will create two sub-cgroups under it and assigned
> ?one tasks per sub-group.
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?------------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| cgroup 1 |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?------------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? / ? ? ? ?\
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/ ? ? ? ? ?\
> ? ? ? ? ? ? ? ? ? ? ? ? ?-------------- ?--------------
> ? ? ? ? ? ? ? ? ? ? ? ? ?|sub-cgroup 1| ?|sub-cgroup 2|
> ? ? ? ? ? ? ? ? ? ? ? ? ?| (task 1) ? | ?| (task 2) ? |
> ? ? ? ? ? ? ? ? ? ? ? ? ?-------------- ?--------------
>
> - Top cgroup is given unlimited quota (cpu.cfs_quota_us = -1) and period of 500ms
> ?(cpu.cfs_period_us = 500000). Whereas the sub-cgroups are given 250ms of quota
> ?(cpu.cfs_quota_us = 250000) and period of 500ms. i.e. the top cgroups are given
> ?unlimited bandwidth, whereas the sub-group are throttled every 250ms.
>
> - Additional if required the proportional CPU shares can be assigned to cpu.shares
> ?as NR_TASKS * 1024. i.e. cgroup1 has 2 tasks * 1024 = 2048 worth cpu.shares
> ?for cgroup1. (In the below test results published all cgroups and sub-cgroups
> ?are given the equal share of 1024).
>
> - One CPU bound while(1) task is attached to each sub-cgroup.
>
> - sum-exec time for each cgroup/sub-cgroup is captured from /proc/sched_debug after
> ?60 seconds and analyzed for the run time of the tasks a.k.a sub-cgroup.
>
> How is the idle CPU time measured ?
> ------------------------------------
> - vmstat stats are logged every 2 seconds, after attaching the last while1 task
> ?to 16th sub-cgroup of cgroup 5 till the 60 sec run is over. After the run idle%
> ?of a CPU is calculated by summing idle column from the vmstat log and dividing it
> ?by number of samples collected, of-course after neglecting the first record
> ?from the log.
>
> How are the tasks pinned to the CPU ?
> -------------------------------------
> - cgroup is mounted with cpuset,cpu controller and for every 2 sub-cgroups one
> ?physical CPU is allocated. i.e. CPU 1 is allocated between 1/1 and 1/2 (Group 1,
> ?sub-cgroup 1 and sub-cgroup 2). Similarly CPUs 7 to 15 are allocated to 15/1 to
> ?15/16 (Group 15, subgroup 1 to 16). Note that test machine used to test has
> ?16 CPUs.
>
> Result for non-pining case
> ---------------------------
> Only the hierarchy is created as stated above and cpusets are not assigned per cgroup.
>
> Average CPU Idle percentage 34.8% (as explained above in the Idle time measured)
> Bandwidth shared with remaining non-Idle 65.2%
>
> * Note: For the sake of roundoff value the numbers are multiplied by 100.
>
> In the below result for cgroup1 9.2500 corresponds to sum-exec time captured
> from /proc/sched_debug for cgroup 1 tasks (including sub-cgroup 1 and 2).
> Which is in-turn 6% of the non-Idle CPU time (which is derived by 9.2500 * 65.2 / 100 )
>
> Bandwidth of Group 1 = 9.2500 i.e = 6.0300% of non-Idle CPU time 65.2%
> |...... subgroup 1/1 ? ?= 48.7800 ? ? ? i.e = 2.9400% of 6.0300% Groups non-Idle CPU time
> |...... subgroup 1/2 ? ?= 51.2100 ? ? ? i.e = 3.0800% of 6.0300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 9.0400 i.e = 5.8900% of non-Idle CPU time 65.2%
> |...... subgroup 2/1 ? ?= 51.0200 ? ? ? i.e = 3.0000% of 5.8900% Groups non-Idle CPU time
> |...... subgroup 2/2 ? ?= 48.9700 ? ? ? i.e = 2.8800% of 5.8900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 16.9300 i.e = 11.0300% of non-Idle CPU time 65.2%
> |...... subgroup 3/1 ? ?= 26.0300 ? ? ? i.e = 2.8700% of 11.0300% Groups non-Idle CPU time
> |...... subgroup 3/2 ? ?= 25.8800 ? ? ? i.e = 2.8500% of 11.0300% Groups non-Idle CPU time
> |...... subgroup 3/3 ? ?= 22.7800 ? ? ? i.e = 2.5100% of 11.0300% Groups non-Idle CPU time
> |...... subgroup 3/4 ? ?= 25.2900 ? ? ? i.e = 2.7800% of 11.0300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 27.9300 i.e = 18.2100% of non-Idle CPU time 65.2%
> |...... subgroup 4/1 ? ?= 16.6000 ? ? ? i.e = 3.0200% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/2 ? ?= 8.0000 ? ? ? ?i.e = 1.4500% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/3 ? ?= 9.0000 ? ? ? ?i.e = 1.6300% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/4 ? ?= 7.9600 ? ? ? ?i.e = 1.4400% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/5 ? ?= 12.3500 ? ? ? i.e = 2.2400% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/6 ? ?= 16.2500 ? ? ? i.e = 2.9500% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/7 ? ?= 12.6100 ? ? ? i.e = 2.2900% of 18.2100% Groups non-Idle CPU time
> |...... subgroup 4/8 ? ?= 17.1900 ? ? ? i.e = 3.1300% of 18.2100% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 36.8300 i.e = 24.0100% of non-Idle CPU time 65.2%
> |...... subgroup 5/1 ? ?= 56.6900 ? ? ? i.e = 13.6100% ?of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/2 ? ?= 8.8600 ? ? ? ?i.e = 2.1200% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/3 ? ?= 5.5100 ? ? ? ?i.e = 1.3200% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/4 ? ?= 4.5700 ? ? ? ?i.e = 1.0900% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/5 ? ?= 7.9500 ? ? ? ?i.e = 1.9000% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/6 ? ?= 2.1600 ? ? ? ?i.e = .5100% ? ?of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/7 ? ?= 2.3400 ? ? ? ?i.e = .5600% ? ?of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/8 ? ?= 2.1500 ? ? ? ?i.e = .5100% ? ?of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/9 ? ?= 9.7200 ? ? ? ?i.e = 2.3300% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/10 ? = 5.0600 ? ? ? ?i.e = 1.2100% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/11 ? = 4.6900 ? ? ? ?i.e = 1.1200% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/12 ? = 8.9700 ? ? ? ?i.e = 2.1500% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/13 ? = 8.4600 ? ? ? ?i.e = 2.0300% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/14 ? = 11.8400 ? ? ? i.e = 2.8400% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/15 ? = 6.3400 ? ? ? ?i.e = 1.5200% ? of 24.0100% Groups non-Idle CPU time
> |...... subgroup 5/16 ? = 5.1500 ? ? ? ?i.e = 1.2300% ? of 24.0100% Groups non-Idle CPU time
>
> Pinned case
> --------------
> CPU hierarchy is created and cpusets are allocated.
>
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
>
> Bandwidth of Group 1 = 6.3400 i.e = 6.3400% of non-Idle CPU time 100%
> |...... subgroup 1/1 ? ?= 50.0400 ? ? ? i.e = 3.1700% of 6.3400% Groups non-Idle CPU time
> |...... subgroup 1/2 ? ?= 49.9500 ? ? ? i.e = 3.1600% of 6.3400% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 6.3200 i.e = 6.3200% of non-Idle CPU time 100%
> |...... subgroup 2/1 ? ?= 50.0400 ? ? ? i.e = 3.1600% of 6.3200% Groups non-Idle CPU time
> |...... subgroup 2/2 ? ?= 49.9500 ? ? ? i.e = 3.1500% of 6.3200% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 12.6300 i.e = 12.6300% of non-Idle CPU time 100%
> |...... subgroup 3/1 ? ?= 25.0300 ? ? ? i.e = 3.1600% of 12.6300% Groups non-Idle CPU time
> |...... subgroup 3/2 ? ?= 25.0100 ? ? ? i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
> |...... subgroup 3/3 ? ?= 25.0000 ? ? ? i.e = 3.1500% of 12.6300% Groups non-Idle CPU time
> |...... subgroup 3/4 ? ?= 24.9400 ? ? ? i.e = 3.1400% of 12.6300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 25.1000 i.e = 25.1000% of non-Idle CPU time 100%
> |...... subgroup 4/1 ? ?= 12.5400 ? ? ? i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/2 ? ?= 12.5100 ? ? ? i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/3 ? ?= 12.5300 ? ? ? i.e = 3.1400% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/4 ? ?= 12.5000 ? ? ? i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/5 ? ?= 12.4900 ? ? ? i.e = 3.1300% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/6 ? ?= 12.4700 ? ? ? i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/7 ? ?= 12.4700 ? ? ? i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
> |...... subgroup 4/8 ? ?= 12.4500 ? ? ? i.e = 3.1200% of 25.1000% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 49.5700 i.e = 49.5700% of non-Idle CPU time 100%
> |...... subgroup 5/1 ? ?= 49.8500 ? ? ? i.e = 24.7100% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/2 ? ?= 6.2900 ? ? ? ?i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/3 ? ?= 6.2800 ? ? ? ?i.e = 3.1100% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/4 ? ?= 6.2700 ? ? ? ?i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/5 ? ?= 6.2700 ? ? ? ?i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/6 ? ?= 6.2600 ? ? ? ?i.e = 3.1000% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/7 ? ?= 6.2500 ? ? ? ?i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/8 ? ?= 6.2400 ? ? ? ?i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/9 ? ?= 6.2400 ? ? ? ?i.e = 3.0900% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/10 ? = 6.2300 ? ? ? ?i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/11 ? = 6.2300 ? ? ? ?i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/12 ? = 6.2200 ? ? ? ?i.e = 3.0800% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/13 ? = 6.2100 ? ? ? ?i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/14 ? = 6.2100 ? ? ? ?i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/15 ? = 6.2100 ? ? ? ?i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
> |...... subgroup 5/16 ? = 6.2100 ? ? ? ?i.e = 3.0700% of 49.5700% Groups non-Idle CPU time
>
> with equal cpu shares allocated to all the groups/sub-cgroups and CFS bandwidth configured
> to allow 100% CPU utilization. We see the CPU idle time in the un-pinned case.
>
> Benchmark used to reproduce the issue, is attached. Justing executing the script should
> report similar numbers.
>
> #!/bin/bash
>
> NR_TASKS1=2
> NR_TASKS2=2
> NR_TASKS3=4
> NR_TASKS4=8
> NR_TASKS5=16
>
> BANDWIDTH=1
> SUBGROUP=1
> PRO_SHARES=0
> MOUNT=/cgroup/
> LOAD=/root/while1
>
> usage()
> {
> ? ? ? ?echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
> ? ? ? ?echo "-b 1|0 set/unset ?Cgroups bandwidth control (default set)"
> ? ? ? ?echo "-s Create sub-groups for every task (default creates sub-group)"
> ? ? ? ?echo "-p create propotional shares based on cpus"
> ? ? ? ?exit
> }
> while getopts ":b:s:p:" arg
> do
> ? ? ? ?case $arg in
> ? ? ? ?b)
> ? ? ? ? ? ? ? ?BANDWIDTH=$OPTARG
> ? ? ? ? ? ? ? ?shift
> ? ? ? ? ? ? ? ?if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt ?0 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?usage
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?;;
> ? ? ? ?s)
> ? ? ? ? ? ? ? ?SUBGROUP=$OPTARG
> ? ? ? ? ? ? ? ?shift
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?usage
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?;;
> ? ? ? ?p)
> ? ? ? ? ? ? ? ?PRO_SHARES=$OPTARG
> ? ? ? ? ? ? ? ?shift
> ? ? ? ? ? ? ? ?if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?usage
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?;;
>
> ? ? ? ?*)
>
> ? ? ? ?esac
> done
> if [ ! -d $MOUNT ]
> then
> ? ? ? ?mkdir -p $MOUNT
> fi
> test()
> {
> ? ? ? ?echo -n "[ "
> ? ? ? ?if [ $1 -eq 0 ]
> ? ? ? ?then
> ? ? ? ? ? ? ? ?echo -ne '\E[42;40mOk'
> ? ? ? ?else
> ? ? ? ? ? ? ? ?echo -ne '\E[31;40mFailed'
> ? ? ? ? ? ? ? ?tput sgr0
> ? ? ? ? ? ? ? ?echo " ]"
> ? ? ? ? ? ? ? ?exit
> ? ? ? ?fi
> ? ? ? ?tput sgr0
> ? ? ? ?echo " ]"
> }
> mount_cgrp()
> {
> ? ? ? ?echo -n "Mounting root cgroup "
> ? ? ? ?mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT &> /dev/null
> ? ? ? ?test $?
> }
>
> umount_cgrp()
> {
> ? ? ? ?echo -n "Unmounting root cgroup "
> ? ? ? ?cd /root/
> ? ? ? ?umount $MOUNT
> ? ? ? ?test $?
> }
>
> create_hierarchy()
> {
> ? ? ? ?mount_cgrp
> ? ? ? ?cpuset_mem=`cat $MOUNT/cpuset.mems`
> ? ? ? ?cpuset_cpu=`cat $MOUNT/cpuset.cpus`
> ? ? ? ?echo -n "creating groups/sub-groups ..."
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?mkdir $MOUNT/$i
> ? ? ? ? ? ? ? ?echo $cpuset_mem > $MOUNT/$i/cpuset.mems
> ? ? ? ? ? ? ? ?echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
> ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?mkdir -p $MOUNT/$i/$j
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ?done
> ? ? ? ?echo "."
> }
>
> cleanup()
> {
> ? ? ? ?pkill -9 while1 &> /dev/null
> ? ? ? ?sleep 10
> ? ? ? ?echo -n "Umount groups/sub-groups .."
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rmdir $MOUNT/$i/$j
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?rmdir $MOUNT/$i
> ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ?done
> ? ? ? ?echo " "
> ? ? ? ?umount_cgrp
> }
>
> load_tasks()
> {
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ?shares="1024"
> ? ? ? ? ? ? ? ?if [ $PRO_SHARES -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?eval shares=$(echo "$jj * 1024" | bc)
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?echo $hares > $MOUNT/$i/cpu.shares
> ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ?echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> ? ? ? ? ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ? ? ? ? ?then
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?$LOAD &
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $! > $MOUNT/$i/$j/tasks
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if [ $BANDWIDTH -eq 1 ]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ? ? ? ? ?else
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?$LOAD &
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $! > $MOUNT/$i/tasks
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $shares > $MOUNT/$i/cpu.shares
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if [ $BANDWIDTH -eq 1 ]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?done
> ? ? ? ?done
> ? ? ? ?echo "Captuing idle cpu time with vmstat...."
> ? ? ? ?vmstat 2 100 &> vmstat_log &
> }
>
> pin_tasks()
> {
> ? ? ? ?cpu=0
> ? ? ? ?count=1
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if [ $count -gt 2 ]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cpu=$((cpu+1))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?count=1
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $cpu > $MOUNT/$i/$j/cpuset.cpus
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?count=$((count+1))
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?else
> ? ? ? ? ? ? ? ? ? ? ? ?case $i in
> ? ? ? ? ? ? ? ? ? ? ? ?1)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo 0 > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?2)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo 1 > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?3)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "2-3" > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?4)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "4-6" > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?5)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "7-15" > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?esac
> ? ? ? ? ? ? ? ?fi
> ? ? ? ?done
>
> }
>
> print_results()
> {
> ? ? ? ?eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> ? ? ? ? ? ? ? ?eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
> ? ? ? ? ? ? ? ?eval avg=$(echo ?"scale=4;($temp / $gtot) * 100" | bc)
> ? ? ? ? ? ? ? ?eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
> ? ? ? ? ? ? ? ?echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -n "|"
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?echo " "
> ? ? ? ? ? ? ? ?echo " "
> ? ? ? ?done
> }
> capture_results()
> {
> ? ? ? ?cat /proc/sched_debug > sched_log
> ? ? ? ?pkill -9 vmstat -c
> ? ? ? ?avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/NR)}')
>
> ? ? ? ?rem=$(echo "scale=2; 100 - $avg" |bc)
> ? ? ? ?echo "Average CPU Idle percentage $avg%"
> ? ? ? ?echo "Bandwidth shared with remaining non-Idle $rem%"
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?cat sched_log |grep -i while1|grep -i " \/$i" > sched_log_$i
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cat sched_log |grep -i while1|grep -i " \/$i\/$j" > sched_log_$i-$j
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ?done
> ? ? ? ?print_results $rem
> }
> create_hierarchy
> pin_tasks
>
> load_tasks
> sleep 60
> capture_results
> cleanup
> exit
>
> Thanks,
> Kamalesh.
>

2011-06-08 10:46:21

by Vladimir Davydov

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

On Tue, 2011-06-07 at 19:45 +0400, Kamalesh Babulal wrote:
> Hi All,
>
> In our test environment, while testing the CFS Bandwidth V6 patch set
> on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> between 30% to 40% while running CPU bound test, with the cgroups tasks
> not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> tasks are pinned to the CPU's, the idle time seen is nearly zero.

(snip)

> load_tasks()
> {
> for (( i=1; i<=5; i++ ))
> do
> jj=$(eval echo "\$NR_TASKS$i")
> shares="1024"
> if [ $PRO_SHARES -eq 1 ]
> then
> eval shares=$(echo "$jj * 1024" | bc)
> fi
> echo $hares > $MOUNT/$i/cpu.shares
^^^^^
a fatal misprint? must be shares, I guess

(Setting cpu.shares to "", i.e. to the minimal possible value, will
definitely confuse the load balancer)

> for (( j=1; j<=$jj; j++ ))
> do
> echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> if [ $SUBGROUP -eq 1 ]
> then
>
> $LOAD &
> echo $! > $MOUNT/$i/$j/tasks
> echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> fi
> else
> $LOAD &
> echo $! > $MOUNT/$i/tasks
> echo $shares > $MOUNT/$i/cpu.shares
>
> if [ $BANDWIDTH -eq 1 ]
> then
> echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> fi
> fi
> done
> done
> echo "Captuing idle cpu time with vmstat...."
> vmstat 2 100 &> vmstat_log &
> }

2011-06-08 16:33:00

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

* Vladimir Davydov <[email protected]> [2011-06-08 14:46:06]:

> On Tue, 2011-06-07 at 19:45 +0400, Kamalesh Babulal wrote:
> > Hi All,
> >
> > In our test environment, while testing the CFS Bandwidth V6 patch set
> > on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> > between 30% to 40% while running CPU bound test, with the cgroups tasks
> > not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> > tasks are pinned to the CPU's, the idle time seen is nearly zero.
>
> (snip)
>
> > load_tasks()
> > {
> > for (( i=1; i<=5; i++ ))
> > do
> > jj=$(eval echo "\$NR_TASKS$i")
> > shares="1024"
> > if [ $PRO_SHARES -eq 1 ]
> > then
> > eval shares=$(echo "$jj * 1024" | bc)
> > fi
> > echo $hares > $MOUNT/$i/cpu.shares
> ^^^^^
> a fatal misprint? must be shares, I guess
>
> (Setting cpu.shares to "", i.e. to the minimal possible value, will
> definitely confuse the load balancer)

My bad. It was fatal typo, thanks for pointing it out. It made a big difference
in the idle time reported. After correcting to $shares, now the CPU idle time
reported is 20% to 22%. Which is 10% less from the previous reported number.

(snip)

There have been questions on how to interpret the results. Consider the
following test run without pinning of the cgroups tasks

Average CPU Idle percentage 20%
Bandwidth shared with remaining non-Idle 80%

Bandwidth of Group 1 = 7.9700% i.e = 6.3700% of non-Idle CPU time 80%
|...... subgroup 1/1 = 50.0200 i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9700 i.e = 3.1800% of 6.3700% Groups non-Idle CPU time

For example let consider the cgroup1 and sum_exec time is the 7 field
captured from the /proc/sched_debug

while1 27273 30665.912793 1988 120 30665.912793 30909.566767 0.021951 /1/2
while1 27272 30511.105690 1995 120 30511.105690 30942.998099 0.017369 /1/1
-----------------

61852.564866
-----------------
- The bandwidth for sub-cgroup1 of cgroup1 is calculated = (30909.566767 * 100) / 61852.564866
= ~50%

and sub-cgroup2 of cgroup1 is calculated = (30942.998099 * 100) / 61852.564866
= ~50%

In the similar way If we add up the sum_exec of all the groups its
------------------------------------------------------------------------------------------------
Group1 Group2 Group3 Group4 Group5 sum_exec
------------------------------------------------------------------------------------------------
61852.564866 + 61686.604930 + 122840.294858 + 232576.303937 +296166.889155 = 775122.657746

again taking the example of cgroup1
Total percentage of bandwidth allocated to cgroup1 = (61852.564866 * 100) / 775122.657746
= ~ 7.9% of total bandwidth of all the cgroups


Calculating the non-idle time is done with
Total (execution time * 100) / (no of cpus * 60000 ms) [script is run for a 60 seconds]
i.e. = (775122.657746 * 100) / (16 * 60000)
= ~80% of non-idle time

Percentage of bandwidth allocated to cgroup1 of the non-idle is derived as
= (cgroup bandwith percentage * non-idle time) / 100
= for cgroup1 = (7.9700 * 80) / 100
= 6.376% bandwidth allocated of non-Idle CPU time.


Bandwidth of Group 2 = 7.9500% i.e = 6.3600% of non-Idle CPU time 80%
|...... subgroup 2/1 = 49.9900 i.e = 3.1700% of 6.3600% Groups non-Idle CPU time
|...... subgroup 2/2 = 50.0000 i.e = 3.1800% of 6.3600% Groups non-Idle CPU time


Bandwidth of Group 3 = 15.8400% i.e = 12.6700% of non-Idle CPU time 80%
|...... subgroup 3/1 = 24.9900 i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9900 i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0600 i.e = 3.1700% of 12.6700% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9400 i.e = 3.1500% of 12.6700% Groups non-Idle CPU time


Bandwidth of Group 4 = 30.0000% i.e = 24.0000% of non-Idle CPU time 80%
|...... subgroup 4/1 = 13.1600 i.e = 3.1500% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/2 = 11.3800 i.e = 2.7300% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/3 = 13.1100 i.e = 3.1400% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.3100 i.e = 2.9500% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.8200 i.e = 3.0700% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/6 = 11.0600 i.e = 2.6500% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/7 = 13.0600 i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
|...... subgroup 4/8 = 13.0600 i.e = 3.1300% of 24.0000% Groups non-Idle CPU time


Bandwidth of Group 5 = 38.2000% i.e = 30.5600% of non-Idle CPU time 80%
|...... subgroup 5/1 = 48.1000 i.e = 14.6900% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/2 = 6.7900 i.e = 2.0700% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.3700 i.e = 1.9400% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/4 = 5.1800 i.e = 1.5800% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/5 = 5.0400 i.e = 1.5400% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/6 = 10.1400 i.e = 3.0900% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/7 = 5.0700 i.e = 1.5400% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/8 = 6.3900 i.e = 1.9500% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/9 = 6.8800 i.e = 2.1000% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/10 = 6.4700 i.e = 1.9700% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/11 = 6.5600 i.e = 2.0000% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/12 = 4.6400 i.e = 1.4100% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/13 = 7.4900 i.e = 2.2800% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/14 = 5.8200 i.e = 1.7700% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.5500 i.e = 2.0000% of 30.5600% Groups non-Idle CPU time
|...... subgroup 5/16 = 5.2700 i.e = 1.6100% of 30.5600% Groups non-Idle CPU time

Thanks,
Kamalesh.

2011-06-09 03:25:38

by Paul Turner

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

Hi Kamalesh,

I'm unable to reproduce the results you describe. One possibility is
load-balancer interaction -- can you describe the topology of the
platform you are running this on?

On both a straight NUMA topology and a hyper-threaded platform I
observe a ~4% delta between the pinned and un-pinned cases.

Thanks -- results below,

- Paul


16 cores -- pinned:
Average CPU Idle percentage 4.77419%
Bandwidth shared with remaining non-Idle 95.22581%
Bandwidth of Group 1 = 6.6300 i.e = 6.3100% of non-Idle CPU time 95.22581%
|...... subgroup 1/1 = 50.0400 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9500 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time


Bandwidth of Group 2 = 6.6300 i.e = 6.3100% of non-Idle CPU time 95.22581%
|...... subgroup 2/1 = 50.0300 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9600 i.e = 3.1500% of 6.3100%
Groups non-Idle CPU time


Bandwidth of Group 3 = 13.2000 i.e = 12.5600% of non-Idle CPU time 95.22581%
|...... subgroup 3/1 = 25.0200 i.e = 3.1400% of 12.5600%
Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9500 i.e = 3.1300% of 12.5600%
Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0400 i.e = 3.1400% of 12.5600%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9700 i.e = 3.1300% of 12.5600%
Groups non-Idle CPU time


Bandwidth of Group 4 = 26.1500 i.e = 24.9000% of non-Idle CPU time 95.22581%
|...... subgroup 4/1 = 12.4700 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5500 i.e = 3.1200% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.4600 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1100% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.5400 i.e = 3.1200% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4700 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.5200 i.e = 3.1100% of 24.9000%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4600 i.e = 3.1000% of 24.9000%
Groups non-Idle CPU time


Bandwidth of Group 5 = 47.3600 i.e = 45.0900% of non-Idle CPU time 95.22581%
|...... subgroup 5/1 = 49.9600 i.e = 22.5200% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.3600 i.e = 2.8600% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2400 i.e = 2.8100% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.1900 i.e = 2.7900% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2700 i.e = 2.8200% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.3400 i.e = 2.8500% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.1900 i.e = 2.7900% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/8 = 6.1500 i.e = 2.7700% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2600 i.e = 2.8200% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2800 i.e = 2.8300% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2800 i.e = 2.8300% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.1400 i.e = 2.7600% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.0900 i.e = 2.7400% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.3000 i.e = 2.8400% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.1600 i.e = 2.7700% of 45.0900%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.3400 i.e = 2.8500% of 45.0900%
Groups non-Idle CPU time

AMD 16 core -- pinned:
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Bandwidth of Group 1 = 6.2800 i.e = 6.2800% of non-Idle CPU time 100%
|...... subgroup 1/1 = 50.0000 i.e = 3.1400% of 6.2800%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9900 i.e = 3.1300% of 6.2800%
Groups non-Idle CPU time


Bandwidth of Group 2 = 6.2800 i.e = 6.2800% of non-Idle CPU time 100%
|...... subgroup 2/1 = 50.0000 i.e = 3.1400% of 6.2800%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9900 i.e = 3.1300% of 6.2800%
Groups non-Idle CPU time


Bandwidth of Group 3 = 12.5500 i.e = 12.5500% of non-Idle CPU time 100%
|...... subgroup 3/1 = 25.0100 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0000 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9700 i.e = 3.1300% of 12.5500%
Groups non-Idle CPU time


Bandwidth of Group 4 = 25.0400 i.e = 25.0400% of non-Idle CPU time 100%
|...... subgroup 4/1 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.5000 i.e = 3.1300% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 3.1200% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4900 i.e = 3.1200% of 25.0400%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4800 i.e = 3.1200% of 25.0400%
Groups non-Idle CPU time


Bandwidth of Group 5 = 49.8200 i.e = 49.8200% of non-Idle CPU time 100%
|...... subgroup 5/1 = 49.9400 i.e = 24.8800% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2700 i.e = 3.1200% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2500 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2500 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2600 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2600 i.e = 3.1100% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2200 i.e = 3.0900% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2200 i.e = 3.0900% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2400 i.e = 3.1000% of 49.8200%
Groups non-Idle CPU time


16 core hyper-threaded subset of 24 core machine (threads not pinned
individually):

Average CPU Idle percentage 35.0645%
Bandwidth shared with remaining non-Idle 64.9355%
Bandwidth of Group 1 = 6.6000 i.e = 4.2800% of non-Idle CPU time 64.9355%
|...... subgroup 1/1 = 50.0600 i.e = 2.1400% of 4.2800%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9300 i.e = 2.1300% of 4.2800%
Groups non-Idle CPU time


Bandwidth of Group 2 = 6.6000 i.e = 4.2800% of non-Idle CPU time 64.9355%
|...... subgroup 2/1 = 50.0100 i.e = 2.1400% of 4.2800%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9800 i.e = 2.1300% of 4.2800%
Groups non-Idle CPU time


Bandwidth of Group 3 = 13.1600 i.e = 8.5400% of non-Idle CPU time 64.9355%
|...... subgroup 3/1 = 25.0200 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9900 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9900 i.e = 2.1300% of 8.5400%
Groups non-Idle CPU time


Bandwidth of Group 4 = 25.9700 i.e = 16.8600% of non-Idle CPU time 64.9355%
|...... subgroup 4/1 = 12.5000 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5100 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.6000 i.e = 2.1200% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.3800 i.e = 2.0800% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.4700 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 2.1000% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.5700 i.e = 2.1100% of 16.8600%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4400 i.e = 2.0900% of 16.8600%
Groups non-Idle CPU time


Bandwidth of Group 5 = 47.6500 i.e = 30.9400% of non-Idle CPU time 64.9355%
|...... subgroup 5/1 = 50.5400 i.e = 15.6300% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.0400 i.e = 1.8600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.0600 i.e = 1.8700% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.4300 i.e = 1.9800% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.3100 i.e = 1.9500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.0000 i.e = 1.8500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.3100 i.e = 1.9500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/8 = 5.9800 i.e = 1.8500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2900 i.e = 1.9400% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.3300 i.e = 1.9500% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.5200 i.e = 2.0100% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.0500 i.e = 1.8700% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.3500 i.e = 1.9600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.3500 i.e = 1.9600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.3400 i.e = 1.9600% of 30.9400%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.4200 i.e = 1.9800% of 30.9400%
Groups non-Idle CPU time


16 core hyper-threaded subset of 24 core machine (threads individually):

Average CPU Idle percentage 31.7419%
Bandwidth shared with remaining non-Idle 68.2581%
Bandwidth of Group 1 = 6.2700 i.e = 4.2700% of non-Idle CPU time 68.2581%
|...... subgroup 1/1 = 50.0100 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9800 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time


Bandwidth of Group 2 = 6.2700 i.e = 4.2700% of non-Idle CPU time 68.2581%
|...... subgroup 2/1 = 50.0100 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9800 i.e = 2.1300% of 4.2700%
Groups non-Idle CPU time


Bandwidth of Group 3 = 12.5300 i.e = 8.5500% of non-Idle CPU time 68.2581%
|...... subgroup 3/1 = 25.0100 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0000 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9800 i.e = 2.1300% of 8.5500%
Groups non-Idle CPU time


Bandwidth of Group 4 = 25.0200 i.e = 17.0700% of non-Idle CPU time 68.2581%
|...... subgroup 4/1 = 12.5100 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/5 = 12.5000 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4900 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4800 i.e = 2.1300% of 17.0700%
Groups non-Idle CPU time


Bandwidth of Group 5 = 49.8900 i.e = 34.0500% of non-Idle CPU time 68.2581%
|...... subgroup 5/1 = 49.9600 i.e = 17.0100% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2600 i.e = 2.1300% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2600 i.e = 2.1300% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2500 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2400 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2300 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2300 i.e = 2.1200% of 34.0500%
Groups non-Idle CPU time

On Wed, Jun 8, 2011 at 9:32 AM, Kamalesh Babulal
<[email protected]> wrote:
> * Vladimir Davydov <[email protected]> [2011-06-08 14:46:06]:
>
>> On Tue, 2011-06-07 at 19:45 +0400, Kamalesh Babulal wrote:
>> > Hi All,
>> >
>> > ? ? In our test environment, while testing the CFS Bandwidth V6 patch set
>> > on top of 55922c9d1b84. We observed that the CPU's idle time is seen
>> > between 30% to 40% while running CPU bound test, with the cgroups tasks
>> > not pinned to the CPU's. Whereas in the inverse case, where the cgroups
>> > tasks are pinned to the CPU's, the idle time seen is nearly zero.
>>
>> (snip)
>>
>> > load_tasks()
>> > {
>> > ? ? ? ? for (( i=1; i<=5; i++ ))
>> > ? ? ? ? do
>> > ? ? ? ? ? ? ? ? jj=$(eval echo "\$NR_TASKS$i")
>> > ? ? ? ? ? ? ? ? shares="1024"
>> > ? ? ? ? ? ? ? ? if [ $PRO_SHARES -eq 1 ]
>> > ? ? ? ? ? ? ? ? then
>> > ? ? ? ? ? ? ? ? ? ? ? ? eval shares=$(echo "$jj * 1024" | bc)
>> > ? ? ? ? ? ? ? ? fi
>> > ? ? ? ? ? ? ? ? echo $hares > $MOUNT/$i/cpu.shares
>> ? ? ? ? ? ? ? ? ? ? ? ? ^^^^^
>> ? ? ? ? ? ? ? ? ? ? ? ? a fatal misprint? must be shares, I guess
>>
>> (Setting cpu.shares to "", i.e. to the minimal possible value, will
>> definitely confuse the load balancer)
>
> My bad. It was fatal typo, thanks for pointing it out. It made a big difference
> in the idle time reported. After correcting to $shares, now the CPU idle time
> reported is 20% to 22%. Which is 10% less from the previous reported number.
>
> (snip)
>
> There have been questions on how to interpret the results. Consider the
> following test run without pinning of the cgroups tasks
>
> Average CPU Idle percentage 20%
> Bandwidth shared with remaining non-Idle 80%
>
> Bandwidth of Group 1 = 7.9700% i.e = 6.3700% of non-Idle CPU time 80%
> |...... subgroup 1/1 ? ?= 50.0200 ? ? ? i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
> |...... subgroup 1/2 ? ?= 49.9700 ? ? ? i.e = 3.1800% of 6.3700% Groups non-Idle CPU time
>
> For example let consider the cgroup1 and sum_exec time is the 7 field
> captured from the /proc/sched_debug
>
> while1 27273 ? ? 30665.912793 ? ? ?1988 ? 120 ? ? 30665.912793 ?30909.566767 ? ? ? ? 0.021951 /1/2
> while1 27272 ? ? 30511.105690 ? ? ?1995 ? 120 ? ? 30511.105690 ?30942.998099 ? ? ? ? 0.017369 /1/1
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------------
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?61852.564866
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------------
> ?- The bandwidth for sub-cgroup1 of cgroup1 is calculated ?= (30909.566767 * 100) / 61852.564866
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? = ~50%
>
> ? and sub-cgroup2 of cgroup1 is calculated ? ? ? ? ? ? ? ?= (30942.998099 * 100) / 61852.564866
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? = ~50%
>
> In the similar way If we add up the sum_exec of all the groups its
> ------------------------------------------------------------------------------------------------
> Group1 ? ? ? ? ?Group2 ? ? ? ? ?Group3 ? ? ? ? ?Group4 ? ? ? ? ?Group5 ? ? ? ? ?sum_exec
> ------------------------------------------------------------------------------------------------
> 61852.564866 + 61686.604930 + 122840.294858 + 232576.303937 +296166.889155 = ? ?775122.657746
>
> again taking the example of cgroup1
> Total percentage of bandwidth allocated to cgroup1 = (61852.564866 * 100) / 775122.657746
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? = ~ 7.9% of total bandwidth of all the cgroups
>
>
> Calculating the non-idle time is done with
> ? ? ? ?Total (execution time * 100) / (no of cpus * 60000 ms) [script is run for a 60 seconds]
> ? ? ? ?i.e. = (775122.657746 * 100) / (16 * 60000)
> ? ? ? ? ? ? = ~80% of non-idle time
>
> Percentage of bandwidth allocated to cgroup1 of the non-idle is derived as
> ? ? ? ?= (cgroup bandwith percentage * non-idle time) / 100
> ? ? ? ?= for cgroup1 ? = (7.9700 * 80) / 100
> ? ? ? ? ? ? ? ? ? ? ? ?= 6.376% bandwidth allocated of non-Idle CPU time.
>
>
> Bandwidth of Group 2 = 7.9500% i.e = 6.3600% of non-Idle CPU time 80%
> |...... subgroup 2/1 ? ?= 49.9900 ? ? ? i.e = 3.1700% of 6.3600% Groups non-Idle CPU time
> |...... subgroup 2/2 ? ?= 50.0000 ? ? ? i.e = 3.1800% of 6.3600% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 15.8400% i.e = 12.6700% of non-Idle CPU time 80%
> |...... subgroup 3/1 ? ?= 24.9900 ? ? ? i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
> |...... subgroup 3/2 ? ?= 24.9900 ? ? ? i.e = 3.1600% of 12.6700% Groups non-Idle CPU time
> |...... subgroup 3/3 ? ?= 25.0600 ? ? ? i.e = 3.1700% of 12.6700% Groups non-Idle CPU time
> |...... subgroup 3/4 ? ?= 24.9400 ? ? ? i.e = 3.1500% of 12.6700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 30.0000% i.e = 24.0000% of non-Idle CPU time 80%
> |...... subgroup 4/1 ? ?= 13.1600 ? ? ? i.e = 3.1500% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/2 ? ?= 11.3800 ? ? ? i.e = 2.7300% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/3 ? ?= 13.1100 ? ? ? i.e = 3.1400% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/4 ? ?= 12.3100 ? ? ? i.e = 2.9500% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/5 ? ?= 12.8200 ? ? ? i.e = 3.0700% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/6 ? ?= 11.0600 ? ? ? i.e = 2.6500% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/7 ? ?= 13.0600 ? ? ? i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
> |...... subgroup 4/8 ? ?= 13.0600 ? ? ? i.e = 3.1300% of 24.0000% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 38.2000% i.e = 30.5600% of non-Idle CPU time 80%
> |...... subgroup 5/1 ? ?= 48.1000 ? ? ? i.e = 14.6900% ?of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/2 ? ?= 6.7900 ? ? ? ?i.e = 2.0700% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/3 ? ?= 6.3700 ? ? ? ?i.e = 1.9400% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/4 ? ?= 5.1800 ? ? ? ?i.e = 1.5800% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/5 ? ?= 5.0400 ? ? ? ?i.e = 1.5400% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/6 ? ?= 10.1400 ? ? ? i.e = 3.0900% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/7 ? ?= 5.0700 ? ? ? ?i.e = 1.5400% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/8 ? ?= 6.3900 ? ? ? ?i.e = 1.9500% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/9 ? ?= 6.8800 ? ? ? ?i.e = 2.1000% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/10 ? = 6.4700 ? ? ? ?i.e = 1.9700% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/11 ? = 6.5600 ? ? ? ?i.e = 2.0000% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/12 ? = 4.6400 ? ? ? ?i.e = 1.4100% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/13 ? = 7.4900 ? ? ? ?i.e = 2.2800% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/14 ? = 5.8200 ? ? ? ?i.e = 1.7700% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/15 ? = 6.5500 ? ? ? ?i.e = 2.0000% ? of 30.5600% Groups non-Idle CPU time
> |...... subgroup 5/16 ? = 5.2700 ? ? ? ?i.e = 1.6100% ? of 30.5600% Groups non-Idle CPU time
>
> Thanks,
> Kamalesh.
>

2011-06-10 18:17:43

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

* Paul Turner <[email protected]> [2011-06-08 20:25:00]:

> Hi Kamalesh,
>
> I'm unable to reproduce the results you describe. One possibility is
> load-balancer interaction -- can you describe the topology of the
> platform you are running this on?
>
> On both a straight NUMA topology and a hyper-threaded platform I
> observe a ~4% delta between the pinned and un-pinned cases.
>
> Thanks -- results below,
>
> - Paul
>
>
(snip)

Hi Paul,

That box is down. I tried running the test on the 2-socket quad-core with
HT and I was not able to reproduce the issue. CPU idle time reported with
both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy
of 3 levels above the 5 cgroups, instead of the current hirerachy where all
the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket
quad-core (HT) box.

-----------
| cgroups |
-----------
|
-----------
| level 1 |
-----------
|
-----------
| level 2 |
-----------
|
-----------
| level 3 |
-----------
/ / | \ \
/ / | \ \
cgrp1 cgrp2 cgrp3 cgrp4 cgrp5


Un-pinned run
--------------

Average CPU Idle percentage 24.8333%
Bandwidth shared with remaining non-Idle 75.1667%
Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
|...... subgroup 1/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
|...... subgroup 1/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time


Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
|...... subgroup 2/1 = 49.9900 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
|...... subgroup 2/2 = 50.0000 i.e = 3.1400% of 6.2900% Groups non-Idle CPU time


Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
|...... subgroup 3/1 = 25.0000 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/2 = 24.9100 i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/3 = 25.0800 i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9900 i.e = 3.1200% of 12.5100% Groups non-Idle CPU time


Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
|...... subgroup 4/1 = 12.0200 i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/2 = 12.3800 i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/3 = 13.6300 i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.7000 i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.8000 i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/6 = 11.9600 i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.7400 i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
|...... subgroup 4/8 = 11.7300 i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time


Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
|...... subgroup 5/1 = 47.7200 i.e = 13.3500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/2 = 5.2000 i.e = 1.4500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/4 = 6.3600 i.e = 1.7700% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/5 = 7.9800 i.e = 2.2300% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/6 = 5.1800 i.e = 1.4400% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/7 = 7.4900 i.e = 2.0900% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/8 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/9 = 7.7500 i.e = 2.1600% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/10 = 4.8100 i.e = 1.3400% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/11 = 4.9300 i.e = 1.3700% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/12 = 6.8900 i.e = 1.9200% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/13 = 6.0700 i.e = 1.6900% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/14 = 6.5200 i.e = 1.8200% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/15 = 5.9200 i.e = 1.6500% of 27.9800% Groups non-Idle CPU time
|...... subgroup 5/16 = 6.6400 i.e = 1.8500% of 27.9800% Groups non-Idle CPU time

Pinned Run
----------

Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%
Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
|...... subgroup 1/1 = 50.0100 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
|...... subgroup 1/2 = 49.9800 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time


Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
|...... subgroup 2/1 = 50.0000 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
|...... subgroup 2/2 = 49.9900 i.e = 3.1300% of 6.2700% Groups non-Idle CPU time


Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
|...... subgroup 3/1 = 25.0100 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/2 = 25.0000 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/3 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
|...... subgroup 3/4 = 24.9900 i.e = 3.1300% of 12.5300% Groups non-Idle CPU time


Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
|...... subgroup 4/1 = 12.5100 i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/2 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/3 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/4 = 12.5000 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/5 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/6 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/7 = 12.4900 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
|...... subgroup 4/8 = 12.4800 i.e = 3.1200% of 25.0200% Groups non-Idle CPU time


Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
|...... subgroup 5/1 = 49.9600 i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/2 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/3 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/4 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/5 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/6 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/7 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/8 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/9 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/10 = 6.2500 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/11 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/12 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/13 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/14 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/15 = 6.2300 i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
|...... subgroup 5/16 = 6.2400 i.e = 3.1100% of 49.8800% Groups non-Idle CPU time

Modified script
---------------

#!/bin/bash

NR_TASKS1=2
NR_TASKS2=2
NR_TASKS3=4
NR_TASKS4=8
NR_TASKS5=16

BANDWIDTH=1
SUBGROUP=1
PRO_SHARES=0
MOUNT_POINT=/cgroups/
MOUNT=/cgroups/
LOAD=./while1
LEVELS=3

usage()
{
echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
echo "-b 1|0 set/unset Cgroups bandwidth control (default set)"
echo "-s Create sub-groups for every task (default creates sub-group)"
echo "-p create propotional shares based on cpus"
exit
}
while getopts ":b:s:p:" arg
do
case $arg in
b)
BANDWIDTH=$OPTARG
shift
if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt 0 ]
then
usage
fi
;;
s)
SUBGROUP=$OPTARG
shift
if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
then
usage
fi
;;
p)
PRO_SHARES=$OPTARG
shift
if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
then
usage
fi
;;

*)

esac
done
if [ ! -d $MOUNT ]
then
mkdir -p $MOUNT
fi
test()
{
echo -n "[ "
if [ $1 -eq 0 ]
then
echo -ne '\E[42;40mOk'
else
echo -ne '\E[31;40mFailed'
tput sgr0
echo " ]"
exit
fi
tput sgr0
echo " ]"
}
mount_cgrp()
{
echo -n "Mounting root cgroup "
mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
test $?
}

umount_cgrp()
{
echo -n "Unmounting root cgroup "
cd /root/
umount $MOUNT_POINT
test $?
}

create_hierarchy()
{
mount_cgrp
cpuset_mem=`cat $MOUNT/cpuset.mems`
cpuset_cpu=`cat $MOUNT/cpuset.cpus`
echo -n "creating hierarchy of levels $LEVELS "
for (( i=1; i<=$LEVELS; i++ ))
do
MOUNT="${MOUNT}/level${i}"
mkdir $MOUNT
echo $cpuset_mem > $MOUNT/cpuset.mems
echo $cpuset_cpu > $MOUNT/cpuset.cpus
echo "-1" > $MOUNT/cpu.cfs_quota_us
echo "500000" > $MOUNT/cpu.cfs_period_us
echo -n " .."
done
echo " "
echo $MOUNT
echo -n "creating groups/sub-groups ..."
for (( i=1; i<=5; i++ ))
do
mkdir $MOUNT/$i
echo $cpuset_mem > $MOUNT/$i/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
echo -n ".."
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
mkdir -p $MOUNT/$i/$j
echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
echo -n ".."
done
fi
done
echo "."
}

cleanup()
{
pkill -9 while1 &> /dev/null
sleep 10
echo -n "Umount groups/sub-groups .."
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
rmdir $MOUNT/$i/$j
echo -n ".."
done
fi
rmdir $MOUNT/$i
echo -n ".."
done
cd $MOUNT
cd ../
for (( i=$LEVELS; i>=1; i-- ))
do
rmdir level$i
cd ../
done
echo " "
umount_cgrp
}

load_tasks()
{
for (( i=1; i<=5; i++ ))
do
jj=$(eval echo "\$NR_TASKS$i")
shares="1024"
if [ $PRO_SHARES -eq 1 ]
then
eval shares=$(echo "$jj * 1024" | bc)
fi
echo $shares > $MOUNT/$i/cpu.shares
for (( j=1; j<=$jj; j++ ))
do
echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
if [ $SUBGROUP -eq 1 ]
then

$LOAD &
echo $! > $MOUNT/$i/$j/tasks
echo "1024" > $MOUNT/$i/$j/cpu.shares

if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
fi
else
$LOAD &
echo $! > $MOUNT/$i/tasks
echo $shares > $MOUNT/$i/cpu.shares

if [ $BANDWIDTH -eq 1 ]
then
echo "500000" > $MOUNT/$i/cpu.cfs_period_us
echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
fi
fi
done
done
echo "Capturing idle cpu time with vmstat...."
vmstat 2 100 &> vmstat_log &
}

pin_tasks()
{
cpu=0
count=1
for (( i=1; i<=5; i++ ))
do
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
if [ $count -gt 2 ]
then
cpu=$((cpu+1))
count=1
fi
echo $cpu > $MOUNT/$i/$j/cpuset.cpus
count=$((count+1))
done
else
case $i in
1)
echo 0 > $MOUNT/$i/cpuset.cpus;;
2)
echo 1 > $MOUNT/$i/cpuset.cpus;;
3)
echo "2-3" > $MOUNT/$i/cpuset.cpus;;
4)
echo "4-6" > $MOUNT/$i/cpuset.cpus;;
5)
echo "7-15" > $MOUNT/$i/cpuset.cpus;;
esac
fi
done

}

print_results()
{
eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
for (( i=1; i<=5; i++ ))
do
eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
eval avg=$(echo "scale=4;($temp / $gtot) * 100" | bc)
eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
echo -n "|"
echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
done
fi
echo " "
echo " "
done
}

capture_results()
{
cat /proc/sched_debug > sched_log
lev=""
for (( i=1; i<=$LEVELS; i++ ))
do
lev="$lev\/level${i}"
done
pkill -9 vmstat
avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')

rem=$(echo "scale=2; 100 - $avg" |bc)
echo "Average CPU Idle percentage $avg%"
echo "Bandwidth shared with remaining non-Idle $rem%"
for (( i=1; i<=5; i++ ))
do
cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
if [ $SUBGROUP -eq 1 ]
then
jj=$(eval echo "\$NR_TASKS$i")
for (( j=1; j<=$jj; j++ ))
do
cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
done
fi
done
print_results $rem
}

create_hierarchy
pin_tasks

load_tasks
sleep 60
capture_results
cleanup
exit

2011-06-14 00:00:45

by Paul Turner

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

Hi Kamalesh.

I tried on both friday and again today to reproduce your results
without success. Results are attached below. The margin of error is
the same as the previous (2-level deep case), ~4%. One minor nit, in
your script's input parsing you're calling shift; you don't need to do
this with getopts and it will actually lead to arguments being
dropped.

Are you testing on top of a clean -tip? Do you have any custom
load-balancer or scheduler settings?

Thanks,

- Paul


Hyper-threaded topology:
unpinned:
Average CPU Idle percentage 38.6333%
Bandwidth shared with remaining non-Idle 61.3667%

pinned:
Average CPU Idle percentage 35.2766%
Bandwidth shared with remaining non-Idle 64.7234%
(The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
mirror your 2 socket 8x2 configuration.)

4-way NUMA topology:
unpinned:
Average CPU Idle percentage 5.26667%
Bandwidth shared with remaining non-Idle 94.73333%

pinned:
Average CPU Idle percentage 0.242424%
Bandwidth shared with remaining non-Idle 99.757576%




On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
<[email protected]> wrote:
> * Paul Turner <[email protected]> [2011-06-08 20:25:00]:
>
>> Hi Kamalesh,
>>
>> I'm unable to reproduce the results you describe. ?One possibility is
>> load-balancer interaction -- can you describe the topology of the
>> platform you are running this on?
>>
>> On both a straight NUMA topology and a hyper-threaded platform I
>> observe a ~4% delta between the pinned and un-pinned cases.
>>
>> Thanks -- results below,
>>
>> - Paul
>>
>>
> (snip)
>
> Hi Paul,
>
> That box is down. I tried running the test on the 2-socket quad-core with
> HT and I was not able to reproduce the issue. CPU idle time reported with
> both pinned and un-pinned case was ~0. But if we create a cgroup hirerachy
> of 3 levels above the 5 cgroups, instead of the current hirerachy where all
> the 5 cgroups created under /cgroup. The Idle time is seen on 2-socket
> quad-core (HT) box.
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| cgroups |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| level 1 |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| level 2 |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?| level 3 |
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-----------
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?/ ? / ? | ? \ ? ? \
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? / ? / ? ?| ? ?\ ? ? \
> ? ? ? ? ? ? ? ? ? ? ? ?cgrp1 ?cgrp2 cgrp3 cgrp4 cgrp5
>
>
> Un-pinned run
> --------------
>
> Average CPU Idle percentage 24.8333%
> Bandwidth shared with remaining non-Idle 75.1667%
> Bandwidth of Group 1 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 1/1 ? ?= 49.9900 ? ? ? i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 1/2 ? ?= 50.0000 ? ? ? i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 8.3700 i.e = 6.2900% of non-Idle CPU time 75.1667%
> |...... subgroup 2/1 ? ?= 49.9900 ? ? ? i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
> |...... subgroup 2/2 ? ?= 50.0000 ? ? ? i.e = 3.1400% of 6.2900% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 16.6500 i.e = 12.5100% of non-Idle CPU time 75.1667%
> |...... subgroup 3/1 ? ?= 25.0000 ? ? ? i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/2 ? ?= 24.9100 ? ? ? i.e = 3.1100% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/3 ? ?= 25.0800 ? ? ? i.e = 3.1300% of 12.5100% Groups non-Idle CPU time
> |...... subgroup 3/4 ? ?= 24.9900 ? ? ? i.e = 3.1200% of 12.5100% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 29.3600 i.e = 22.0600% of non-Idle CPU time 75.1667%
> |...... subgroup 4/1 ? ?= 12.0200 ? ? ? i.e = 2.6500% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/2 ? ?= 12.3800 ? ? ? i.e = 2.7300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/3 ? ?= 13.6300 ? ? ? i.e = 3.0000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/4 ? ?= 12.7000 ? ? ? i.e = 2.8000% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/5 ? ?= 12.8000 ? ? ? i.e = 2.8200% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/6 ? ?= 11.9600 ? ? ? i.e = 2.6300% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/7 ? ?= 12.7400 ? ? ? i.e = 2.8100% of 22.0600% Groups non-Idle CPU time
> |...... subgroup 4/8 ? ?= 11.7300 ? ? ? i.e = 2.5800% of 22.0600% Groupsnon-Idle CPU time
>
>
> Bandwidth of Group 5 = 37.2300 i.e = 27.9800% of non-Idle CPU time 75.1667%
> |...... subgroup 5/1 ? ?= 47.7200 ? ? ? i.e = 13.3500% ?of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/2 ? ?= 5.2000 ? ? ? ?i.e = 1.4500% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/3 ? ?= 6.3600 ? ? ? ?i.e = 1.7700% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/4 ? ?= 6.3600 ? ? ? ?i.e = 1.7700% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/5 ? ?= 7.9800 ? ? ? ?i.e = 2.2300% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/6 ? ?= 5.1800 ? ? ? ?i.e = 1.4400% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/7 ? ?= 7.4900 ? ? ? ?i.e = 2.0900% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/8 ? ?= 5.9200 ? ? ? ?i.e = 1.6500% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/9 ? ?= 7.7500 ? ? ? ?i.e = 2.1600% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/10 ? = 4.8100 ? ? ? ?i.e = 1.3400% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/11 ? = 4.9300 ? ? ? ?i.e = 1.3700% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/12 ? = 6.8900 ? ? ? ?i.e = 1.9200% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/13 ? = 6.0700 ? ? ? ?i.e = 1.6900% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/14 ? = 6.5200 ? ? ? ?i.e = 1.8200% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/15 ? = 5.9200 ? ? ? ?i.e = 1.6500% ? of 27.9800% Groups non-Idle CPU time
> |...... subgroup 5/16 ? = 6.6400 ? ? ? ?i.e = 1.8500% ? of 27.9800% Groups non-Idle CPU time
>
> Pinned Run
> ----------
>
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
> Bandwidth of Group 1 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 1/1 ? ?= 50.0100 ? ? ? i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 1/2 ? ?= 49.9800 ? ? ? i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 2 = 6.2700 i.e = 6.2700% of non-Idle CPU time 100%
> |...... subgroup 2/1 ? ?= 50.0000 ? ? ? i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
> |...... subgroup 2/2 ? ?= 49.9900 ? ? ? i.e = 3.1300% of 6.2700% Groups non-Idle CPU time
>
>
> Bandwidth of Group 3 = 12.5300 i.e = 12.5300% of non-Idle CPU time 100%
> |...... subgroup 3/1 ? ?= 25.0100 ? ? ? i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/2 ? ?= 25.0000 ? ? ? i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/3 ? ?= 24.9900 ? ? ? i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
> |...... subgroup 3/4 ? ?= 24.9900 ? ? ? i.e = 3.1300% of 12.5300% Groups non-Idle CPU time
>
>
> Bandwidth of Group 4 = 25.0200 i.e = 25.0200% of non-Idle CPU time 100%
> |...... subgroup 4/1 ? ?= 12.5100 ? ? ? i.e = 3.1300% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/2 ? ?= 12.5000 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/3 ? ?= 12.5000 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/4 ? ?= 12.5000 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/5 ? ?= 12.4900 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/6 ? ?= 12.4900 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/7 ? ?= 12.4900 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
> |...... subgroup 4/8 ? ?= 12.4800 ? ? ? i.e = 3.1200% of 25.0200% Groups non-Idle CPU time
>
>
> Bandwidth of Group 5 = 49.8800 i.e = 49.8800% of non-Idle CPU time 100%
> |...... subgroup 5/1 ? ?= 49.9600 ? ? ? i.e = 24.9200% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/2 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/3 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/4 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/5 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/6 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/7 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/8 ? ?= 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/9 ? ?= 6.2400 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/10 ? = 6.2500 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/11 ? = 6.2400 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/12 ? = 6.2400 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/13 ? = 6.2400 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/14 ? = 6.2400 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/15 ? = 6.2300 ? ? ? ?i.e = 3.1000% of 49.8800% Groups non-Idle CPU time
> |...... subgroup 5/16 ? = 6.2400 ? ? ? ?i.e = 3.1100% of 49.8800% Groups non-Idle CPU time
>
> Modified script
> ---------------
>
> #!/bin/bash
>
> NR_TASKS1=2
> NR_TASKS2=2
> NR_TASKS3=4
> NR_TASKS4=8
> NR_TASKS5=16
>
> BANDWIDTH=1
> SUBGROUP=1
> PRO_SHARES=0
> MOUNT_POINT=/cgroups/
> MOUNT=/cgroups/
> LOAD=./while1
> LEVELS=3
>
> usage()
> {
> ? ? ? ?echo "Usage $0: [-b 0|1] [-s 0|1] [-p 0|1]"
> ? ? ? ?echo "-b 1|0 set/unset ?Cgroups bandwidth control (default set)"
> ? ? ? ?echo "-s Create sub-groups for every task (default creates sub-group)"
> ? ? ? ?echo "-p create propotional shares based on cpus"
> ? ? ? ?exit
> }
> while getopts ":b:s:p:" arg
> do
> ? ? ? ?case $arg in
> ? ? ? ?b)
> ? ? ? ? ? ? ? ?BANDWIDTH=$OPTARG
> ? ? ? ? ? ? ? ?shift
> ? ? ? ? ? ? ? ?if [ $BANDWIDTH -gt 1 ] && [ $BANDWIDTH -lt ?0 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?usage
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?;;
> ? ? ? ?s)
> ? ? ? ? ? ? ? ?SUBGROUP=$OPTARG
> ? ? ? ? ? ? ? ?shift
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -gt 1 ] && [ $SUBGROUP -lt 0 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?usage
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?;;
> ? ? ? ?p)
> ? ? ? ? ? ? ? ?PRO_SHARES=$OPTARG
> ? ? ? ? ? ? ? ?shift
> ? ? ? ? ? ? ? ?if [ $PRO_SHARES -gt 1 ] && [ $PRO_SHARES -lt 0 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?usage
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?;;
>
> ? ? ? ?*)
>
> ? ? ? ?esac
> done
> if [ ! -d $MOUNT ]
> then
> ? ? ? ?mkdir -p $MOUNT
> fi
> test()
> {
> ? ? ? ?echo -n "[ "
> ? ? ? ?if [ $1 -eq 0 ]
> ? ? ? ?then
> ? ? ? ? ? ? ? ?echo -ne '\E[42;40mOk'
> ? ? ? ?else
> ? ? ? ? ? ? ? ?echo -ne '\E[31;40mFailed'
> ? ? ? ? ? ? ? ?tput sgr0
> ? ? ? ? ? ? ? ?echo " ]"
> ? ? ? ? ? ? ? ?exit
> ? ? ? ?fi
> ? ? ? ?tput sgr0
> ? ? ? ?echo " ]"
> }
> mount_cgrp()
> {
> ? ? ? ?echo -n "Mounting root cgroup "
> ? ? ? ?mount -t cgroup -ocpu,cpuset,cpuacct none $MOUNT_POINT &> /dev/null
> ? ? ? ?test $?
> }
>
> umount_cgrp()
> {
> ? ? ? ?echo -n "Unmounting root cgroup "
> ? ? ? ?cd /root/
> ? ? ? ?umount $MOUNT_POINT
> ? ? ? ?test $?
> }
>
> create_hierarchy()
> {
> ? ? ? ?mount_cgrp
> ? ? ? ?cpuset_mem=`cat $MOUNT/cpuset.mems`
> ? ? ? ?cpuset_cpu=`cat $MOUNT/cpuset.cpus`
> ? ? ? ?echo -n "creating hierarchy of levels $LEVELS "
> ? ? ? ?for (( i=1; i<=$LEVELS; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?MOUNT="${MOUNT}/level${i}"
> ? ? ? ? ? ? ? ?mkdir $MOUNT
> ? ? ? ? ? ? ? ?echo $cpuset_mem > $MOUNT/cpuset.mems
> ? ? ? ? ? ? ? ?echo $cpuset_cpu > $MOUNT/cpuset.cpus
> ? ? ? ? ? ? ? ?echo "-1" > $MOUNT/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/cpu.cfs_period_us
> ? ? ? ? ? ? ? ?echo -n " .."
> ? ? ? ?done
> ? ? ? ?echo " "
> ? ? ? ?echo $MOUNT
> ? ? ? ?echo -n "creating groups/sub-groups ..."
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?mkdir $MOUNT/$i
> ? ? ? ? ? ? ? ?echo $cpuset_mem > $MOUNT/$i/cpuset.mems
> ? ? ? ? ? ? ? ?echo $cpuset_cpu > $MOUNT/$i/cpuset.cpus
> ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?mkdir -p $MOUNT/$i/$j
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $cpuset_mem > $MOUNT/$i/$j/cpuset.mems
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $cpuset_cpu > $MOUNT/$i/$j/cpuset.cpus
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ?done
> ? ? ? ?echo "."
> }
>
> cleanup()
> {
> ? ? ? ?pkill -9 while1 &> /dev/null
> ? ? ? ?sleep 10
> ? ? ? ?echo -n "Umount groups/sub-groups .."
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?rmdir $MOUNT/$i/$j
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?rmdir $MOUNT/$i
> ? ? ? ? ? ? ? ?echo -n ".."
> ? ? ? ?done
> ? ? ? ?cd $MOUNT
> ? ? ? ?cd ../
> ? ? ? ?for (( i=$LEVELS; i>=1; i-- ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?rmdir level$i
> ? ? ? ? ? ? ? ?cd ../
> ? ? ? ?done
> ? ? ? ?echo " "
> ? ? ? ?umount_cgrp
> }
>
> load_tasks()
> {
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ?shares="1024"
> ? ? ? ? ? ? ? ?if [ $PRO_SHARES -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?eval shares=$(echo "$jj * 1024" | bc)
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?echo $shares > $MOUNT/$i/cpu.shares
> ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ?echo "-1" > $MOUNT/$i/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> ? ? ? ? ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ? ? ? ? ?then
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?$LOAD &
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $! > $MOUNT/$i/$j/tasks
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "1024" > $MOUNT/$i/$j/cpu.shares
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if [ $BANDWIDTH -eq 1 ]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/$i/$j/cpu.cfs_period_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "250000" > $MOUNT/$i/$j/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ? ? ? ? ?else
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?$LOAD &
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $! > $MOUNT/$i/tasks
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $shares > $MOUNT/$i/cpu.shares
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if [ $BANDWIDTH -eq 1 ]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "500000" > $MOUNT/$i/cpu.cfs_period_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "250000" > $MOUNT/$i/cpu.cfs_quota_us
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?done
> ? ? ? ?done
> ? ? ? ?echo "Capturing idle cpu time with vmstat...."
> ? ? ? ?vmstat 2 100 &> vmstat_log &
> }
>
> pin_tasks()
> {
> ? ? ? ?cpu=0
> ? ? ? ?count=1
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?if [ $count -gt 2 ]
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cpu=$((cpu+1))
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?count=1
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo $cpu > $MOUNT/$i/$j/cpuset.cpus
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?count=$((count+1))
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?else
> ? ? ? ? ? ? ? ? ? ? ? ?case $i in
> ? ? ? ? ? ? ? ? ? ? ? ?1)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo 0 > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?2)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo 1 > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?3)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "2-3" > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?4)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "4-6" > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?5)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo "7-15" > $MOUNT/$i/cpuset.cpus;;
> ? ? ? ? ? ? ? ? ? ? ? ?esac
> ? ? ? ? ? ? ? ?fi
> ? ? ? ?done
>
> }
>
> print_results()
> {
> ? ? ? ?eval gtot=$(cat sched_log|grep -i while|sed 's/R//g'|awk '{gtot+=$7};END{printf "%f", gtot}')
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?eval temp=$(cat sched_log_$i|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> ? ? ? ? ? ? ? ?eval tavg=$(echo "scale=4;(($temp / $gtot) * $1)/100 " | bc)
> ? ? ? ? ? ? ? ?eval avg=$(echo ?"scale=4;($temp / $gtot) * 100" | bc)
> ? ? ? ? ? ? ? ?eval pretty_tavg=$( echo "scale=4; $tavg * 100"| bc) # F0r pretty format
> ? ? ? ? ? ? ? ?echo "Bandwidth of Group $i = $avg i.e = $pretty_tavg% of non-Idle CPU time $1%"
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?eval tmp=$(cat sched_log_$i-$j|sed 's/R//g'| awk '{gtot+=$7};END{printf "%f",gtot}')
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?eval stavg=$(echo "scale=4;($tmp / $temp) * 100" | bc)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?eval pretty_stavg=$(echo "scale=4;(($tmp / $temp) * $tavg) * 100" | bc)
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -n "|"
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?echo -e "...... subgroup $i/$j\t= $stavg\ti.e = $pretty_stavg% of $pretty_tavg% Groups non-Idle CPU time"
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ? ? ? ? ?echo " "
> ? ? ? ? ? ? ? ?echo " "
> ? ? ? ?done
> }
>
> capture_results()
> {
> ? ? ? ?cat /proc/sched_debug > sched_log
> ? ? ? ?lev=""
> ? ? ? ?for (( i=1; i<=$LEVELS; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?lev="$lev\/level${i}"
> ? ? ? ?done
> ? ? ? ?pkill -9 vmstat
> ? ? ? ?avg=$(cat vmstat_log |grep -iv "system"|grep -iv "swpd"|awk ' { if ( NR != 1) {id+=$15 }}END{print (id/(NR-1))}')
>
> ? ? ? ?rem=$(echo "scale=2; 100 - $avg" |bc)
> ? ? ? ?echo "Average CPU Idle percentage $avg%"
> ? ? ? ?echo "Bandwidth shared with remaining non-Idle $rem%"
> ? ? ? ?for (( i=1; i<=5; i++ ))
> ? ? ? ?do
> ? ? ? ? ? ? ? ?cat sched_log |grep -i while1|grep -i "$lev\/$i" > sched_log_$i
> ? ? ? ? ? ? ? ?if [ $SUBGROUP -eq 1 ]
> ? ? ? ? ? ? ? ?then
> ? ? ? ? ? ? ? ? ? ? ? ?jj=$(eval echo "\$NR_TASKS$i")
> ? ? ? ? ? ? ? ? ? ? ? ?for (( j=1; j<=$jj; j++ ))
> ? ? ? ? ? ? ? ? ? ? ? ?do
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?cat sched_log |grep -i while1|grep -i "$lev\/$i\/$j" > sched_log_$i-$j
> ? ? ? ? ? ? ? ? ? ? ? ?done
> ? ? ? ? ? ? ? ?fi
> ? ? ? ?done
> ? ? ? ?print_results $rem
> }
>
> create_hierarchy
> pin_tasks
>
> load_tasks
> sleep 60
> capture_results
> cleanup
> exit
>
>

2011-06-14 06:58:37

by Hu Tao

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

Hi,

I've run several tests including hackbench, unixbench, massive-intr
and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
4 cores, and 4G memory.

Most of the time the results differ few, but there are problems:

1. unixbench: execl throughout has about 5% drop.
2. unixbench: process creation has about 5% drop.
3. massive-intr: when running 200 processes for 5mins, the number
of loops each process runs differ more than before cfs-bandwidth-v6.

The results are attached.


Attachments:
(No filename) (496.00 B)
massive-intr-200-300-without-patch.txt (2.72 kB)
massive-intr-200-300-with-patch.txt (3.12 kB)
massive-intr-200-60-without-patch.txt (2.55 kB)
massive-intr-200-60-with-patch.txt (3.05 kB)
massive-intr.png (78.04 kB)
unixbench-cfs-bandwidth-v6 (5.49 kB)
unixbench-without-cfs-bandwidth-v6 (5.50 kB)
Download all attachments

2011-06-14 07:30:27

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

(2011/06/14 15:58), Hu Tao wrote:
> Hi,
>
> I've run several tests including hackbench, unixbench, massive-intr
> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> 4 cores, and 4G memory.
>
> Most of the time the results differ few, but there are problems:
>
> 1. unixbench: execl throughout has about 5% drop.
> 2. unixbench: process creation has about 5% drop.
> 3. massive-intr: when running 200 processes for 5mins, the number
> of loops each process runs differ more than before cfs-bandwidth-v6.
>
> The results are attached.

I know the score of unixbench is not so stable that the problem might
be noises ... but the result of massive-intr is interesting.
Could you give a try to find which piece (xx/15) in the series cause
the problems?

Thanks,
H.Seto

2011-06-14 07:44:46

by Hu Tao

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> (2011/06/14 15:58), Hu Tao wrote:
> > Hi,
> >
> > I've run several tests including hackbench, unixbench, massive-intr
> > and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> > 4 cores, and 4G memory.
> >
> > Most of the time the results differ few, but there are problems:
> >
> > 1. unixbench: execl throughout has about 5% drop.
> > 2. unixbench: process creation has about 5% drop.
> > 3. massive-intr: when running 200 processes for 5mins, the number
> > of loops each process runs differ more than before cfs-bandwidth-v6.
> >
> > The results are attached.
>
> I know the score of unixbench is not so stable that the problem might
> be noises ... but the result of massive-intr is interesting.
> Could you give a try to find which piece (xx/15) in the series cause
> the problems?

OK. I'll do it.

>
> Thanks,
> H.Seto

2011-06-14 10:17:38

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

(2011/06/08 0:45), Kamalesh Babulal wrote:
> Hi All,
>
> In our test environment, while testing the CFS Bandwidth V6 patch set
> on top of 55922c9d1b84. We observed that the CPU's idle time is seen
> between 30% to 40% while running CPU bound test, with the cgroups tasks
> not pinned to the CPU's. Whereas in the inverse case, where the cgroups
> tasks are pinned to the CPU's, the idle time seen is nearly zero.

I've some test with your test script but I'm not sure whether it is really
a considerable problem. Am I missing the point?

I add -c option to your script to toggle pinning (1:pinned, 0:not pinned).
In short the results in my environment (16 cpu, 4 quad core) are:

# group's usage
-b 0 -p 0 -c 0 : Idle = 0% (12,12,25,25,25)
-b 0 -p 0 -c 1 : Idle = 0% (6,6,12,25,50)
-b 0 -p 1 -c * : Idle = 0% (6,6,12,25,50)
-b 1 -p 0 -c 0 : Idle = ~25% (6,6,12,25,25)
-b 1 -p 0 -c 1 : Idle = 0% (6,6,12,25,50)
-b 1 -p 1 -c * : Idle = 0% (6,6,12,25,50)

In my understanding is correct, when -p0, there are 5 groups (with share=1024)
and each group has 2,2,4,8,16 subgroups, so a subgroup in /1 is weighted 8 times
higher than one in /5. And when -p1, share of 5 parent groups are promoted and
all subgroups are evenly weighted.
With -p0 the cpu usage of 5 groups is going to be 20,20,20,20,20 but group /1
and /2 have only 2 subgroups for each, so even if /1 and /2 fully use 2 cpus
for each the usage will be 12,12,25,25,25.

OTOH the bandwidth of a subgroup is 250000/500000 (=0.5 cpu), so in case of
Idle=0% the cpu usage of groups are likely be 6,6,12,25,50%.

The question is what happen if both are mixed.

For example in case of your unpinned Idle=34.8%:

> Average CPU Idle percentage 34.8% (as explained above in the Idle time measured)
> Bandwidth shared with remaining non-Idle 65.2%

> Bandwidth of Group 1 = 9.2500 i.e = 6.0300% of non-Idle CPU time 65.2%
> Bandwidth of Group 2 = 9.0400 i.e = 5.8900% of non-Idle CPU time 65.2%
> Bandwidth of Group 3 = 16.9300 i.e = 11.0300% of non-Idle CPU time 65.2%
> Bandwidth of Group 4 = 27.9300 i.e = 18.2100% of non-Idle CPU time 65.2%
> Bandwidth of Group 5 = 36.8300 i.e = 24.0100% of non-Idle CPU time 65.2%

The usage is 6,6,11,18,24.
It looks like that group /1 to /3 are limited by bandwidth, while group /5 is
limited by share. (I have no idea about the noise on /4 here)

BTW since pinning in your script always pin a couple of subgroup in a same
group to a cpu, subgroups are weighted evenly everywhere so as the result
share doesn't work for these cases.


Thanks,
H.Seto

2011-06-15 05:37:37

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

* Paul Turner <[email protected]> [2011-06-13 17:00:08]:

> Hi Kamalesh.
>
> I tried on both friday and again today to reproduce your results
> without success. Results are attached below. The margin of error is
> the same as the previous (2-level deep case), ~4%. One minor nit, in
> your script's input parsing you're calling shift; you don't need to do
> this with getopts and it will actually lead to arguments being
> dropped.
>
> Are you testing on top of a clean -tip? Do you have any custom
> load-balancer or scheduler settings?
>
> Thanks,
>
> - Paul
>
>
> Hyper-threaded topology:
> unpinned:
> Average CPU Idle percentage 38.6333%
> Bandwidth shared with remaining non-Idle 61.3667%
>
> pinned:
> Average CPU Idle percentage 35.2766%
> Bandwidth shared with remaining non-Idle 64.7234%
> (The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
> mirror your 2 socket 8x2 configuration.)
>
> 4-way NUMA topology:
> unpinned:
> Average CPU Idle percentage 5.26667%
> Bandwidth shared with remaining non-Idle 94.73333%
>
> pinned:
> Average CPU Idle percentage 0.242424%
> Bandwidth shared with remaining non-Idle 99.757576%
>
Hi Paul,

I tried tip 919c9baa9 + V6 patchset on 2 socket,quadcore with HT and
the Idle time seen is ~22% to ~23%. Kernel is not tuned to any custom
load-balancer/scheduler settings.

unpinned:
Average CPU Idle percentage 23.5333%
Bandwidth shared with remaining non-Idle 76.4667%

pinned:
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%

Thanks,

Kamalesh
>
>
>
> On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
> <[email protected]> wrote:
> > * Paul Turner <[email protected]> [2011-06-08 20:25:00]:
> >
> >> Hi Kamalesh,
> >>
> >> I'm unable to reproduce the results you describe. ?One possibility is
> >> load-balancer interaction -- can you describe the topology of the
> >> platform you are running this on?
> >>
> >> On both a straight NUMA topology and a hyper-threaded platform I
> >> observe a ~4% delta between the pinned and un-pinned cases.
> >>
> >> Thanks -- results below,
> >>
> >> - Paul
> >>
> >>
(snip)

2011-06-15 08:38:41

by Hu Tao

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> (2011/06/14 15:58), Hu Tao wrote:
> > Hi,
> >
> > I've run several tests including hackbench, unixbench, massive-intr
> > and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> > 4 cores, and 4G memory.
> >
> > Most of the time the results differ few, but there are problems:
> >
> > 1. unixbench: execl throughout has about 5% drop.
> > 2. unixbench: process creation has about 5% drop.
> > 3. massive-intr: when running 200 processes for 5mins, the number
> > of loops each process runs differ more than before cfs-bandwidth-v6.
> >
> > The results are attached.
>
> I know the score of unixbench is not so stable that the problem might
> be noises ... but the result of massive-intr is interesting.
> Could you give a try to find which piece (xx/15) in the series cause
> the problems?

After more tests, I found massive-intr data is not stable, too. Results
are attached. The third number in file name means which patchs are
applied, 0 means no patch applied. plot.sh is easy to generate png
files.


Attachments:
(No filename) (1.07 kB)
massive-intr-200-300-0-1.txt (3.12 kB)
massive-intr-200-300-10.txt (2.78 kB)
massive-intr-200-300-11.txt (3.00 kB)
massive-intr-200-300-12.txt (3.12 kB)
massive-intr-200-300-13.txt (2.56 kB)
massive-intr-200-300-14.txt (2.94 kB)
massive-intr-200-300-15-1.txt (3.02 kB)
massive-intr-200-300-15-2.txt (3.11 kB)
massive-intr-200-300-15.txt (2.98 kB)
massive-intr-200-300-16-1.txt (2.56 kB)
massive-intr-200-300-1.txt (3.08 kB)
massive-intr-200-300-2.txt (3.05 kB)
massive-intr-200-300-3.txt (2.98 kB)
massive-intr-200-300-4.txt (3.05 kB)
massive-intr-200-300-5.txt (2.70 kB)
massive-intr-200-300-6.txt (3.00 kB)
massive-intr-200-300-7.txt (2.92 kB)
massive-intr-200-300-8.txt (3.12 kB)
massive-intr-200-300-9.txt (3.11 kB)
massive-intr-200-300-without-patch.txt (2.72 kB)
massive-intr-200-300-with-patch.txt (3.12 kB)
plot.sh (1.21 kB)
Download all attachments

2011-06-16 00:58:12

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

(2011/06/15 17:37), Hu Tao wrote:
> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
>> (2011/06/14 15:58), Hu Tao wrote:
>>> Hi,
>>>
>>> I've run several tests including hackbench, unixbench, massive-intr
>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
>>> 4 cores, and 4G memory.
>>>
>>> Most of the time the results differ few, but there are problems:
>>>
>>> 1. unixbench: execl throughout has about 5% drop.
>>> 2. unixbench: process creation has about 5% drop.
>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>> of loops each process runs differ more than before cfs-bandwidth-v6.
>>>
>>> The results are attached.
>>
>> I know the score of unixbench is not so stable that the problem might
>> be noises ... but the result of massive-intr is interesting.
>> Could you give a try to find which piece (xx/15) in the series cause
>> the problems?
>
> After more tests, I found massive-intr data is not stable, too. Results
> are attached. The third number in file name means which patchs are
> applied, 0 means no patch applied. plot.sh is easy to generate png
> files.

(Though I don't know what the 16th patch of this series is, anyway)
I see that the results of 15, 15-1 and 15-2 are very different and that
15-2 is similar to without-patch.

One concern is whether this unstable of data is really caused by the
nature of your test (hardware, massive-intr itself and something running
in background etc.) or by a hidden piece in the bandwidth patch set.
Did you see "not stable" data when none of patches is applied?
If not, which patch makes it unstable?


Thanks,
H.Seto

2011-06-16 09:45:51

by Hu Tao

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
> (2011/06/15 17:37), Hu Tao wrote:
> > On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> >> (2011/06/14 15:58), Hu Tao wrote:
> >>> Hi,
> >>>
> >>> I've run several tests including hackbench, unixbench, massive-intr
> >>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> >>> 4 cores, and 4G memory.
> >>>
> >>> Most of the time the results differ few, but there are problems:
> >>>
> >>> 1. unixbench: execl throughout has about 5% drop.
> >>> 2. unixbench: process creation has about 5% drop.
> >>> 3. massive-intr: when running 200 processes for 5mins, the number
> >>> of loops each process runs differ more than before cfs-bandwidth-v6.
> >>>
> >>> The results are attached.
> >>
> >> I know the score of unixbench is not so stable that the problem might
> >> be noises ... but the result of massive-intr is interesting.
> >> Could you give a try to find which piece (xx/15) in the series cause
> >> the problems?
> >
> > After more tests, I found massive-intr data is not stable, too. Results
> > are attached. The third number in file name means which patchs are
> > applied, 0 means no patch applied. plot.sh is easy to generate png
> > files.
>
> (Though I don't know what the 16th patch of this series is, anyway)

the 16th patch is this: https://lkml.org/lkml/2011/5/23/503

> I see that the results of 15, 15-1 and 15-2 are very different and that
> 15-2 is similar to without-patch.
>
> One concern is whether this unstable of data is really caused by the
> nature of your test (hardware, massive-intr itself and something running
> in background etc.) or by a hidden piece in the bandwidth patch set.
> Did you see "not stable" data when none of patches is applied?

Yes.

But for a five-runs the result seems 'stable'(before patches and after
patches). I've also run the tests in single mode. results are attached.


Attachments:
(No filename) (1.89 kB)
massive-intr-200-300-0-1.txt (3.12 kB)
massive-intr-200-300-0-2.txt (2.94 kB)
massive-intr-200-300-0-3.txt (2.98 kB)
massive-intr-200-300-0-4.txt (3.09 kB)
massive-intr-200-300-0-5.txt (2.78 kB)
massive-intr-200-300-16-1.txt (2.95 kB)
massive-intr-200-300-16-2.txt (3.11 kB)
massive-intr-200-300-16-3.txt (2.92 kB)
massive-intr-200-300-16-4.txt (2.95 kB)
massive-intr-200-300-16-5.txt (3.12 kB)
massive-intr-200-300-single-0-1.txt (3.03 kB)
massive-intr-200-300-single-0-2.txt (3.11 kB)
massive-intr-200-300-single-0-3.txt (2.86 kB)
massive-intr-200-300-single-0-4.txt (3.11 kB)
massive-intr-200-300-single-0-5.txt (3.08 kB)
massive-intr-200-300-single-16-1.txt (3.02 kB)
massive-intr-200-300-single-16-2.txt (3.00 kB)
massive-intr-200-300-single-16-3.txt (3.12 kB)
massive-intr-200-300-single-16-4.txt (3.03 kB)
massive-intr-200-300-single-16-5.txt (3.05 kB)
0-1.png (12.10 kB)
16-1.png (12.25 kB)
single-0-1.png (9.77 kB)
single-16-1.png (10.85 kB)
Download all attachments

2011-06-17 01:23:55

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

(2011/06/16 18:45), Hu Tao wrote:
> On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
>> (2011/06/15 17:37), Hu Tao wrote:
>>> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
>>>> (2011/06/14 15:58), Hu Tao wrote:
>>>>> Hi,
>>>>>
>>>>> I've run several tests including hackbench, unixbench, massive-intr
>>>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
>>>>> 4 cores, and 4G memory.
>>>>>
>>>>> Most of the time the results differ few, but there are problems:
>>>>>
>>>>> 1. unixbench: execl throughout has about 5% drop.
>>>>> 2. unixbench: process creation has about 5% drop.
>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
>>>>>
>>>>> The results are attached.
>>>>
>>>> I know the score of unixbench is not so stable that the problem might
>>>> be noises ... but the result of massive-intr is interesting.
>>>> Could you give a try to find which piece (xx/15) in the series cause
>>>> the problems?
>>>
>>> After more tests, I found massive-intr data is not stable, too. Results
>>> are attached. The third number in file name means which patchs are
>>> applied, 0 means no patch applied. plot.sh is easy to generate png
>>> files.
>>
>> (Though I don't know what the 16th patch of this series is, anyway)

I see. It will be replaced by Paul's update.

> the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
>
>> I see that the results of 15, 15-1 and 15-2 are very different and that
>> 15-2 is similar to without-patch.
>>
>> One concern is whether this unstable of data is really caused by the
>> nature of your test (hardware, massive-intr itself and something running
>> in background etc.) or by a hidden piece in the bandwidth patch set.
>> Did you see "not stable" data when none of patches is applied?
>
> Yes.
>
> But for a five-runs the result seems 'stable'(before patches and after
> patches). I've also run the tests in single mode. results are attached.

(It will be appreciated greatly if you could provide not only raw results
but also your current observation/speculation.)

Well, (to wrap it up,) do you still see the following problem?

>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>> of loops each process runs differ more than before cfs-bandwidth-v6.

I think that 5 samples are not enough to draw a conclusion, and that at the
moment it is inconsiderable. How do you think?

Even though pointed problems are gone, I have to say thank you for taking
your time to test this CFS bandwidth patch set.
I'd appreciate it if you could continue your test, possibly against V7.
(I'm waiting, Paul?)


Thanks,
H.Seto

2011-06-17 06:06:03

by Hu Tao

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

On Fri, Jun 17, 2011 at 10:22:51AM +0900, Hidetoshi Seto wrote:
> (2011/06/16 18:45), Hu Tao wrote:
> > On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
> >> (2011/06/15 17:37), Hu Tao wrote:
> >>> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
> >>>> (2011/06/14 15:58), Hu Tao wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I've run several tests including hackbench, unixbench, massive-intr
> >>>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 @ 2.40GHz,
> >>>>> 4 cores, and 4G memory.
> >>>>>
> >>>>> Most of the time the results differ few, but there are problems:
> >>>>>
> >>>>> 1. unixbench: execl throughout has about 5% drop.
> >>>>> 2. unixbench: process creation has about 5% drop.
> >>>>> 3. massive-intr: when running 200 processes for 5mins, the number
> >>>>> of loops each process runs differ more than before cfs-bandwidth-v6.
> >>>>>
> >>>>> The results are attached.
> >>>>
> >>>> I know the score of unixbench is not so stable that the problem might
> >>>> be noises ... but the result of massive-intr is interesting.
> >>>> Could you give a try to find which piece (xx/15) in the series cause
> >>>> the problems?
> >>>
> >>> After more tests, I found massive-intr data is not stable, too. Results
> >>> are attached. The third number in file name means which patchs are
> >>> applied, 0 means no patch applied. plot.sh is easy to generate png
> >>> files.
> >>
> >> (Though I don't know what the 16th patch of this series is, anyway)
>
> I see. It will be replaced by Paul's update.
>
> > the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
> >
> >> I see that the results of 15, 15-1 and 15-2 are very different and that
> >> 15-2 is similar to without-patch.
> >>
> >> One concern is whether this unstable of data is really caused by the
> >> nature of your test (hardware, massive-intr itself and something running
> >> in background etc.) or by a hidden piece in the bandwidth patch set.
> >> Did you see "not stable" data when none of patches is applied?
> >
> > Yes.
> >
> > But for a five-runs the result seems 'stable'(before patches and after
> > patches). I've also run the tests in single mode. results are attached.
>
> (It will be appreciated greatly if you could provide not only raw results
> but also your current observation/speculation.)

Sorry I didn't make me clear.

>
> Well, (to wrap it up,) do you still see the following problem?
>
> >>>>> 3. massive-intr: when running 200 processes for 5mins, the number
> >>>>> of loops each process runs differ more than before cfs-bandwidth-v6.

Even when before applying the patches, the numbers differ much between
several runs of massive_intr, this is the reason I say the data is not
stable. But treating the results of five runs as a whole, it shows some
stability. The results after the patches are similar, and the average
loops differ little comparing to the results before the patches(compare
0-1.png and 16-1.png in my last mail). so I would say the patches don't
bring too much impact on interactive processes.

>
> I think that 5 samples are not enough to draw a conclusion, and that at the
> moment it is inconsiderable. How do you think?

At least 5 samples reveal something, but if you'd like I can take more
samples.

>
> Even though pointed problems are gone, I have to say thank you for taking
> your time to test this CFS bandwidth patch set.
> I'd appreciate it if you could continue your test, possibly against V7.
> (I'm waiting, Paul?)
>
>
> Thanks,
> H.Seto

Thanks,
--
Hu Tao

2011-06-17 06:26:10

by Paul Turner

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

On Thu, Jun 16, 2011 at 6:22 PM, Hidetoshi Seto
<[email protected]> wrote:
> (2011/06/16 18:45), Hu Tao wrote:
>> On Thu, Jun 16, 2011 at 09:57:09AM +0900, Hidetoshi Seto wrote:
>>> (2011/06/15 17:37), Hu Tao wrote:
>>>> On Tue, Jun 14, 2011 at 04:29:49PM +0900, Hidetoshi Seto wrote:
>>>>> (2011/06/14 15:58), Hu Tao wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've run several tests including hackbench, unixbench, massive-intr
>>>>>> and kernel building. CPU is Intel(R) Xeon(R) CPU X3430 ?@ 2.40GHz,
>>>>>> 4 cores, and 4G memory.
>>>>>>
>>>>>> Most of the time the results differ few, but there are problems:
>>>>>>
>>>>>> 1. unixbench: execl throughout has about 5% drop.
>>>>>> 2. unixbench: process creation has about 5% drop.
>>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>>> ? ?of loops each process runs differ more than before cfs-bandwidth-v6.
>>>>>>
>>>>>> The results are attached.
>>>>>
>>>>> I know the score of unixbench is not so stable that the problem might
>>>>> be noises ... but the result of massive-intr is interesting.
>>>>> Could you give a try to find which piece (xx/15) in the series cause
>>>>> the problems?
>>>>
>>>> After more tests, I found massive-intr data is not stable, too. Results
>>>> are attached. The third number in file name means which patchs are
>>>> applied, 0 means no patch applied. plot.sh is easy to generate png
>>>> files.
>>>
>>> (Though I don't know what the 16th patch of this series is, anyway)
>
> I see. ?It will be replaced by Paul's update.
>
>> the 16th patch is this: https://lkml.org/lkml/2011/5/23/503
>>
>>> I see that the results of 15, 15-1 and 15-2 are very different and that
>>> 15-2 is similar to without-patch.
>>>
>>> One concern is whether this unstable of data is really caused by the
>>> nature of your test (hardware, massive-intr itself and something running
>>> in background etc.) or by a hidden piece in the bandwidth patch set.
>>> Did you see "not stable" data when none of patches is applied?
>>
>> Yes.
>>
>> But for a five-runs the result seems 'stable'(before patches and after
>> patches). I've also run the tests in single mode. results are attached.
>
> (It will be appreciated greatly if you could provide not only raw results
> but also your current observation/speculation.)
>
> Well, (to wrap it up,) do you still see the following problem?
>
>>>>>> 3. massive-intr: when running 200 processes for 5mins, the number
>>>>>> ? ?of loops each process runs differ more than before cfs-bandwidth-v6.
>
> I think that 5 samples are not enough to draw a conclusion, and that at the
> moment it is inconsiderable. ?How do you think?
>
> Even though pointed problems are gone, I have to say thank you for taking
> your time to test this CFS bandwidth patch set.
> I'd appreciate it if you could continue your test, possibly against V7.
> (I'm waiting, Paul?)

It should be out in a few hours, as I was preparing everything today I
realized an latent error existed in the quota expiration path;
specifically that on a wake-up from a sufficiently long sleep we will
see expired quota and have to wait for the timer to recharge bandwidth
before we're actually allowed to run. Currently munging the results
of fixing that and making sure everything else is correct in the wake
of those changes.

>
>
> Thanks,
> H.Seto
>
>

2011-06-17 09:14:01

by Hidetoshi Seto

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

(2011/06/17 15:25), Paul Turner wrote:
> It should be out in a few hours, as I was preparing everything today I
> realized an latent error existed in the quota expiration path;
> specifically that on a wake-up from a sufficiently long sleep we will
> see expired quota and have to wait for the timer to recharge bandwidth
> before we're actually allowed to run. Currently munging the results
> of fixing that and making sure everything else is correct in the wake
> of those changes.

Thanks!
I'll check it some time early next week.


Thanks,
H.Seto

2011-06-18 00:28:59

by Paul Turner

[permalink] [raw]
Subject: Re: [patch 00/15] CFS Bandwidth Control V6

On Fri, Jun 17, 2011 at 2:13 AM, Hidetoshi Seto
<[email protected]> wrote:
> (2011/06/17 15:25), Paul Turner wrote:
>> It should be out in a few hours, as I was preparing everything today I
>> realized an latent error existed in the quota expiration path;
>> specifically that on a wake-up from a sufficiently long sleep we will
>> see expired quota and have to wait for the timer to recharge bandwidth
>> before we're actually allowed to run. ?Currently munging the results
>> of fixing that and making sure everything else is correct in the wake
>> of those changes.
>
> Thanks!
> I'll check it some time early next week.

So it's been a long session of hunting races and implementing the
cleanups above.

Unfortunately as my finger hovered over the send button I realized one
hurdle remains -- there's a narrow race in the period timer shutdown
path:

- Our period timer can decide that we're going idle as a result of no activity
- Right after it makes this decision a task sneaks in and runs on
another cpu. We can see the timer has chosen to go idle (it's
possible to synchronize on that state around the bandwidth lock) but
there's no good way to kick the period timer into an about-face since
it's already active.
- The timing is sufficiently rare and short that we could do something
awful like spin until the timer is complete, but I think it's probably
better to put a kick in one of our already existing re-occuring paths
such as update_shares.

I'll fix this after some sleep, I'm out of steam for now.


>
>
> Thanks,
> H.Seto
>
>

2011-06-21 19:48:51

by Paul Turner

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

Hi Kamalesh,

Can you see what things look like under v7?

There's been a few improvements to quota re-distribution that should
hopefully help your test case.

The remaining idle% I see on my machines appear to be a product of
load-balancer inefficiency.

Thanks!

- Paul

On Tue, Jun 14, 2011 at 10:37 PM, Kamalesh Babulal
<[email protected]> wrote:
> * Paul Turner <[email protected]> [2011-06-13 17:00:08]:
>
>> Hi Kamalesh.
>>
>> I tried on both friday and again today to reproduce your results
>> without success. ?Results are attached below. ?The margin of error is
>> the same as the previous (2-level deep case), ~4%. ?One minor nit, in
>> your script's input parsing you're calling shift; you don't need to do
>> this with getopts and it will actually lead to arguments being
>> dropped.
>>
>> Are you testing on top of a clean -tip? ?Do you have any custom
>> load-balancer or scheduler settings?
>>
>> Thanks,
>>
>> - Paul
>>
>>
>> Hyper-threaded topology:
>> unpinned:
>> Average CPU Idle percentage 38.6333%
>> Bandwidth shared with remaining non-Idle 61.3667%
>>
>> pinned:
>> Average CPU Idle percentage 35.2766%
>> Bandwidth shared with remaining non-Idle 64.7234%
>> (The mask in the "unpinned" case is 0-3,6-9,12-15,18-21 which should
>> mirror your 2 socket 8x2 configuration.)
>>
>> 4-way NUMA topology:
>> unpinned:
>> Average CPU Idle percentage 5.26667%
>> Bandwidth shared with remaining non-Idle 94.73333%
>>
>> pinned:
>> Average CPU Idle percentage 0.242424%
>> Bandwidth shared with remaining non-Idle 99.757576%
>>
> Hi Paul,
>
> I tried tip 919c9baa9 + V6 patchset on 2 socket,quadcore with HT and
> the Idle time seen is ~22% to ~23%. Kernel is not tuned to any custom
> load-balancer/scheduler settings.
>
> unpinned:
> Average CPU Idle percentage 23.5333%
> Bandwidth shared with remaining non-Idle 76.4667%
>
> pinned:
> Average CPU Idle percentage 0%
> Bandwidth shared with remaining non-Idle 100%
>
> Thanks,
>
> ?Kamalesh
>>
>>
>>
>> On Fri, Jun 10, 2011 at 11:17 AM, Kamalesh Babulal
>> <[email protected]> wrote:
>> > * Paul Turner <[email protected]> [2011-06-08 20:25:00]:
>> >
>> >> Hi Kamalesh,
>> >>
>> >> I'm unable to reproduce the results you describe. ?One possibility is
>> >> load-balancer interaction -- can you describe the topology of the
>> >> platform you are running this on?
>> >>
>> >> On both a straight NUMA topology and a hyper-threaded platform I
>> >> observe a ~4% delta between the pinned and un-pinned cases.
>> >>
>> >> Thanks -- results below,
>> >>
>> >> - Paul
>> >>
>> >>
> (snip)
>

2011-06-24 15:05:50

by Kamalesh Babulal

[permalink] [raw]
Subject: Re: CFS Bandwidth Control - Test results of cgroups tasks pinned vs unpinned

* Paul Turner <[email protected]> [2011-06-21 12:48:17]:

> Hi Kamalesh,
>
> Can you see what things look like under v7?
>
> There's been a few improvements to quota re-distribution that should
> hopefully help your test case.
>
> The remaining idle% I see on my machines appear to be a product of
> load-balancer inefficiency.
>
> Thanks!
>
> - Paul
(snip)

Hi Paul,

Sorry for the delay in the response. I tried the V7 patchset on
top of tip. Patchset passed different combinations build and boot
tests.

I have re-run the tests with couple of combinations on the same
2 socket,4 core, HT box. The test data was collected for 60 seconds
run

un-pinned and cpu shares of 1024
-------------------------------------------------
Top five cgroups and its sub-cgroups were assigned default
cpu shares of 1024.

Average CPU Idle percentage 21.8333%
Bandwidth shared with remaining non-Idle 78.1667%


un-pinned and cpu shares are proportional
--------------------------------------------------
Top five cgroups were assigned cpu shares proportional to
no of sub-cgroups it has under its hierarchy.
For example cgroup1's share is (1024*2) = 2048 and each sub-cgroups
has shares of 1024.

Average CPU Idle percentage 14.2%
Bandwidth shared with remaining non-Idle 85.8%


pinned and cpu shares of 1024
--------------------------------------------------
Average CPU Idle percentage 0.0666667%
Bandwidth shared with remaining non-Idle 99.9333333%


pinned and cpu shares are proportional
--------------------------------------------------
Average CPU Idle percentage 0%
Bandwidth shared with remaining non-Idle 100%


I have captured the perf sched stats for every run. Let me
know if that will help. I can mail them to you privately.

Thanks,
Kamalesh.