Message-ID: <1372176104.7497.86.camel@marge.simpson.net>
Subject: Re: Scheduler accounting inflated for io bound processes.
From: Mike Galbraith <bitbucket@online.de>
To: Dave Chiluk <chiluk@canonical.com>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel@vger.kernel.org
Date: Tue, 25 Jun 2013 18:01:44 +0200
In-Reply-To: <51C35C05.1070005@canonical.com>
References: <51C35C05.1070005@canonical.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4226
Lines: 90

On Thu, 2013-06-20 at 14:46 -0500, Dave Chiluk wrote: 
> Running the below testcase shows each process consuming 41-43% of it's
> respective cpu while per core idle numbers show 63-65%, a disparity of
> roughly 4-8%.  Is this a bug, known behaviour, or consequence of the
> process being io bound?

All three I suppose.  Idle is indeed inflated when softirq load is
present.  Depends on ACCOUNTING config what exact numbers you see.

There are lies, there are damn lies.. and there are statistics.

> 1. run sudo taskset -c 0 netserver
> 2. run taskset -c 1 netperf -H localhost -l 3600 -t TCP_RR & (start
> netperf with priority on cpu1)
> 3. run top, press 1 for multiple CPUs to be separated

CONFIG_TICK_CPU_ACCOUNTING cpu[23] isolated

cgexec -g cpuset:rtcpus netperf.sh 999&sleep 300 && killall -9 top

%Cpu2  :  6.8 us, 42.0 sy,  0.0 ni, 42.0 id,  0.0 wa,  0.0 hi,  9.1 si,  0.0 st
%Cpu3  :  5.6 us, 43.3 sy,  0.0 ni, 40.0 id,  0.0 wa,  0.0 hi, 11.1 si,  0.0 st
                                    ^^^^
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
 7226 root      20   0  8828  336  192 S 57.6  0.0   2:49.40 3 netserver   100*(2*60+49.4)/300 = 56.4
 7225 root      20   0  8824  648  504 R 55.6  0.0   2:46.55 2 netperf     100*(2*60+46.55)/300 = 55.5

Ok, accumulated time ~agrees with %CPU snapshots.

cgexec -g cpuset:rtcpus taskset -c 3 schedctl -I pert 5

(pert is self calibrating tsc tight loop perturbation measurement
proggy, enters kernel once per 5s period for write.  It doesn't care
about post period stats processing/output time, but it's running
SCHED_IDLE, gets VERY little CPU when competing, so runs more or less
only when netserver is idle.  Plenty good enough proxy for idle.) 
...
cgexec -g cpuset:rtcpus netperf.sh 9999
...
pert/s:    81249 >17.94us:       24 min:  0.08 max: 33.89 avg:  8.24 sum/s:669515us overhead:66.95%
pert/s:    81151 >18.43us:       25 min:  0.14 max: 37.53 avg:  8.25 sum/s:669505us overhead:66.95%
                                                                           ^^^^^^^^^^^^^^^^^^^^^^^
pert userspace tsc loop gets ~32% ~= idle upper bound, reported = ~40%,
disparity ~8%.

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
23067 root      20   0  8828  340  196 R 57.5  0.0   0:19.15 3 netserver
23040 root      20   0  8208  396  304 R 42.7  0.0   0:35.61 3 pert         
                                         ^^^^ ~10% disparity.

perf record -e irq:softirq* -a -C 3 -- sleep 00
perf report --sort=comm

    99.80%  netserver
     0.20%       pert

pert does ~zip softirq processing (timer+rcu) and ~zip squat kernel.

Repeat.

cgexec -g cpuset:rtcpus netperf.sh 3600
pert/s:    80860 >474.34us:        0 min:  0.06 max: 35.26 avg:  8.28 sum/s:669197us overhead:66.92%
pert/s:    80897 >429.20us:        0 min:  0.14 max: 37.61 avg:  8.27 sum/s:668673us overhead:66.87%
pert/s:    80800 >388.26us:        0 min:  0.14 max: 31.33 avg:  8.26 sum/s:667277us overhead:66.73%

%Cpu3  : 36.3 us, 51.5 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi, 12.1 si,  0.0 st
         ^^^^ ~agrees with pert
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
23569 root      20   0  8828  340  196 R 57.2  0.0   0:21.97 3 netserver
23040 root      20   0  8208  396  304 R 42.9  0.0   6:46.20 3 pert
                                         ^^^^ pert is VERY nearly 100% userspace
                                              one of those numbers is a.. statistic
Kills pert...

%Cpu3  :  3.4 us, 42.5 sy,  0.0 ni, 41.4 id,  0.1 wa,  0.0 hi, 12.5 si,  0.0 st
          ^^^ ~agrees that pert's us claim did go away, but wth is up
              with sy, it dropped ~9% after killing ~100% us proggy.  nak
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM     TIME+ P COMMAND
23569 root      20   0  8828  340  196 R 56.6  0.0   2:50.80 3 netserver

Yup, adding softirq load turns utilization numbers into.. statistics.
Pure cpu load idle numbers look fine.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/