Subject: Re: [PATCH v2] sched/fair: update scale invariance of PELT
To: Morten Rasmussen <morten.rasmussen@arm.com>,
        Vincent Guittot <vincent.guittot@linaro.org>
References: <1491815909-13345-1-git-send-email-vincent.guittot@linaro.org>
 <20170428155258.GA12012@e105550-lin.cambridge.arm.com>
Cc: peterz@infradead.org, mingo@kernel.org, linux-kernel@vger.kernel.org,
        yuyang.du@intel.com, pjt@google.com, bsegall@google.com
From: Dietmar Eggemann <dietmar.eggemann@arm.com>
Message-ID: <9e629311-7da1-afab-a493-3300f11836d8@arm.com>
Date: Fri, 28 Apr 2017 18:08:36 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <20170428155258.GA12012@e105550-lin.cambridge.arm.com>
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2190
Lines: 60

On 28/04/17 16:52, Morten Rasmussen wrote:
> Hi Vincent,

[...]

> As mentioned above, waiting time, i.e. !running && weight, is not
> scaled, which causes trouble for load.

I ran some rt-app-based tests on a system with frequency and cpu invariance.

(1) Two periodic 20% tasks with 12ms period on a cpu (capacity=1024) at
625Mhz (max 1100Mhz) starting to run at the same time, so one task
(task1) is wakeup preempted. (I'm only considering the phase of the test
run where this is a stable condition, i.e. task1 is always wakeup
preempted by task2).

So the runtime of a task is 0.2*12ms*1100/625 = 4.2ms.

At the beginning of the preemption period, __update_load_avg_se(task1)
is called with running=0 and weight=0, at the end with running=0 and
weight=1024.

When task1 finally runs there are two calls with (running=1,
weight=1024) before the next wakeup preemption period for task1 starts
again with (running=0, weight=0).

Task task2 which doesn't suffer from wakeup preemption starts running
with (running=0, weight=0), then there are 2 calls with (running=1,
weight=1024) before it starts running again with (running=0, weight=0).

Task1 is runnable for 8.4ms and sleeps for 3.6ms whereas task is
runnable for 4.2ms and sleeps for 7.8ms.

The load signal of task1 is ~600 whereas the the load of task2 is ~200.

(2) Two periodic 20% tasks with 12ms period on a cpu (capacity=1024) at
1100Mhz (max 1100Mhz) starting to run at the same time, so one task
(task1) is wakeup preempted.

So the runtime of one task is 0.2*12ms*1100/1100 = 2.4ms.

Task1 is runnable for 4.8ms and sleeps for 7.2ms whereas task is
runnable for 2.4ms and sleeps for 9.6ms.

The load signal of task1 is ~400 whereas the the load of task2 is ~200.

Like Morten said, the scaling for load works differently on different
OPP's. Scaling for utilization looks fine.

IMHO, the implementation of your scale_time() function can't take
preemption into consideration.

I also did tests comparing the time_scaling implementation with tip
(contribution scaling) (two periodic tasks 20%/16ms at 625Mhz/1100Mhz
and 20%/32ms at 625Mhz/1100Mhz) showing this as a difference between
time_scaling and tip.

-- Dietmar

[...]