Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932172AbcDMQVT (ORCPT ); Wed, 13 Apr 2016 12:21:19 -0400 Received: from mail-lf0-f51.google.com ([209.85.215.51]:33846 "EHLO mail-lf0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932147AbcDMQVS (ORCPT ); Wed, 13 Apr 2016 12:21:18 -0400 MIME-Version: 1.0 In-Reply-To: References: <1460327765-18024-1-git-send-email-yuyang.du@intel.com> <1460327765-18024-3-git-send-email-yuyang.du@intel.com> <570E61FE.4060000@arm.com> From: Vincent Guittot Date: Wed, 13 Apr 2016 18:20:56 +0200 Message-ID: Subject: Re: [PATCH 2/4] sched/fair: Drop out incomplete current period when sched averages accrue To: Dietmar Eggemann Cc: Yuyang Du , Peter Zijlstra , Ingo Molnar , linux-kernel , Benjamin Segall , Paul Turner , Morten Rasmussen , Juri Lelli Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2590 Lines: 68 On 13 April 2016 at 17:28, Vincent Guittot wrote: > On 13 April 2016 at 17:13, Dietmar Eggemann wrote: >> On 10/04/16 23:36, Yuyang Du wrote: >>> In __update_load_avg(), the current period is never complete. This >>> basically leads to a slightly over-decayed average, say on average we >>> have 50% current period, then we will lose 1.08%(=(1-0.5^(1/64)) of >>> past avg. More importantly, the incomplete current period significantly >>> complicates the avg computation, even a full period is only about 1ms. >>> >>> So we attempt to drop it. The outcome is that for any x.y periods to >>> update, we will either lose the .y period or unduely gain (1-.y) period. >>> How big is the impact? For a large x (say 20ms), you barely notice the >>> difference, which is plus/minus 1% (=(before-after)/before). Moreover, >>> the aggregated losses and gains in the long run should statistically >>> even out. >>> >> >> For a periodic task, the signals really get much more unstable. Even for >> a steady state (load/util related) periodic task there is a meander >> pattern which depends on if we for instance hit a dequeue (decay + >> accrue) or an enqueue (decay only) after the 1ms has elapsed. >> >> IMHO, 1ms is too big to create signals describing task and cpu load/util >> signals given the current scheduler dynamics. We simply see too many >> signal driving points (e.g. enqueue/dequeue) bailing out of >> __update_load_avg(). >> >> Examples of 1 periodic task pinned to a cpu on an ARM64 system, HZ=250 >> in steady state: >> >> (1) task runtime = 100us period = 200us >> >> pelt load/util signal >> >> 1us: 488-491 >> >> 1ms: 483-534 >> >> We get ~2 dequeues (load/util example: 493->504) and ~2 enqueues >> (load/util example: 496->483) in the meander pattern in the 1ms case. >> >> (2) task runtime = 100us period = 1000us >> >> pelt load/util signal >> >> 1us: 103-105 >> >> 1ms: 84-145 >> >> We get ~3-4 dequeues (load/util example: 104->124->134->140) and ~16-20 >> enqueues (load/util example: 137->134->...->99->97) in the meander >> pattern in the 1ms case. > > yes, similarly i have some use cases with 2ms running task in a period > of 5.12ms. it will be seen either as a 1ms running task or a 2ms sorry, it's 5.242ms no 5.12ms > running tasks depending on how the running is synced with the 1ms > boundary > > so the load will vary between 197-215 up to 396-423 depending of when > the 1ms boundary occurs in the 2ms running > > Vincent > >> >> [...]