Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752576AbdHJPWm (ORCPT ); Thu, 10 Aug 2017 11:22:42 -0400 Received: from mail-qk0-f180.google.com ([209.85.220.180]:37650 "EHLO mail-qk0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751914AbdHJPWj (ORCPT ); Thu, 10 Aug 2017 11:22:39 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170804154023.26874-1-joelaf@google.com>

From: Joel Fernandes Date: Thu, 10 Aug 2017 08:22:37 -0700 Message-ID: Subject: Re: [PATCH] sched/fair: Make PELT signal more accurate To: Vincent Guittot Cc: linux-kernel , kernel-team@android.com, Peter Zijlstra , Juri Lelli , Brendan Jackman , Dietmar Eggeman , Ingo Molnar Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 10604 Lines: 251 On Thu, Aug 10, 2017 at 12:17 AM, Vincent Guittot wrote: > On 9 August 2017 at 19:51, Joel Fernandes wrote: >> Hi Vincent, >> >> On Wed, Aug 9, 2017 at 3:23 AM, Vincent Guittot >> wrote: >> >>>> >>>> Yes this is true, however since I'm using the 'delta' instead of >>>> period_contrib, its only does the update every 128us, however if >>>> several updates fall within a 128us boundary then those will be rate >>>> limited. So say we have a flood of updates, then the updates have to >>>> be spaced every 128us to reach the maximum number of division, I don't >>>> know whether this is a likely situation or would happen very often? I >>>> am planning to run some benchmarks and check that there is no >>>> regression as well as Peter mentioned about the performance aspect. >>>> >>>>>> In order to compare the signals with/without the patch I created a synthetic >>>>>> test (20ms runtime, 100ms period) and analyzed the signals and created a report >>>>>> on the analysis data/plots both with and without the fix: >>>>>> http://www.linuxinternals.org/misc/pelt-error.pdf >>>>> >>>>> The glitch described in page 2 shows a decrease of the util_avg which >>>>> is not linked to accuracy of the calculation but due to the use of the >>>>> wrong range when computing util_avg. >>>> >>>> Yes, and I corrected the graphs this time to show what its like after >>>> your patch and confirm that there is STILL a glitch. You are right >>>> that there isn't a reduction after your patch, however in my updated >>>> graphs there is a glitch and its not a downward peak but a stall in >>>> the update, the error is still quite high and can be as high as the >>>> absolute 2% error, in my update graphs I show an example where its ~ >>>> 1.8% (18 / 1024). >>>> >>>> Could you please take a look at my updated document? I have included >>>> new graph and traces there and color coded them so its easy to >>>> correlate the trace lines to the error in the graph: Here's the >>>> updated new link: >>>> https://github.com/joelagnel/joelagnel.github.io/blob/master/misc/pelt-error-rev2.pdf >>> >>> I see strange behavior in your rev2 document: >>> At timestamp 9.235635, we have util_avg=199 acc_util_avg=199 >>> util_err=0. Everything looks fine but I don't this point on the graph >>> Then, at 9.235636 (which is the red colored faulty trace), we have >>> util_avg=182 acc_util_avg=200 util_err=18. >>> Firstly, this means that util_avg has been updated (199 -> 182) so the >>> error is not a problem of util_avg not been updated often enough :-) >>> Then, util_avg decreases (199 -> 182) whereas it must increase because >>> the task is still running. This should not happen and this is exactly >>> what commit 625ed2bf049d should fix. So either the patch is not >>> applied or it doesn't fix completely the problem. >> >> I think you are looking at wrong trace lines. The graph is generated >> with for rq util only (cfs_rq == 1), so the lines in the traces you >> should look at are the ones with cfs_rq= 1. Only cfs_rq==1 lines were >> used to generate the graphs. > > Ah! this is quite confusing and not obvious that the trace is not for > 1 signal but in fact 2 signals are interleaved and only 1 is displayed > and that we have to filter them Sorry, its my fault that I didn't make this fact clear enough in the doc :-/. Thanks for your patience. > So ok i can see that the trace with cfs_rq=1 is not updated. At the > opposite, we can see that the other trace (for the se i assume) is > updated normally whereas they are normally synced on the same clock Ah, ok. I also checked that the error can for the se as well, in following example for cfs_rq=0 : Lines filtered: task_tick_fair: pelt_update: util_avg=359 load_avg=364 acc_load_avg=364 acc_util_avg=359 util_err=0 load_err=0 load_sum=17041923 sum_err=0 delta_us=977 cfs_rq=0 ret=1 task_tick_fair: pelt_update: util_avg=373 load_avg=377 acc_load_avg=377 acc_util_avg=373 util_err=0 load_err=0 load_sum=17656717 sum_err=0 delta_us=978 cfs_rq=0 ret=1 task_tick_fair: pelt_update: util_avg=373 load_avg=377 acc_load_avg=390 acc_util_avg=386 util_err=13 load_err=13 load_sum=18651021 sum_err=-994304 delta_us=971 cfs_rq=0 dequeue_task_fair: pelt_update: util_avg=396 load_avg=400 acc_load_avg=400 acc_util_avg=396 util_err=0 load_err=0 load_sum=18987624 sum_err=0 delta_us=720 cfs_rq=0 ret=1 So here we have 359, 373, 373, 396. Lines unfiltered: task_tick_fair: pelt_update: util_avg=349 load_avg=413 acc_load_avg=413 acc_util_avg=349 util_err=0 load_err=0 load_sum=19460993 sum_err=0 delta_us=980 cfs_rq=1 ret=1 task_tick_fair: pelt_update: util_avg=359 load_avg=364 acc_load_avg=364 acc_util_avg=359 util_err=0 load_err=0 load_sum=17041923 sum_err=0 delta_us=977 cfs_rq=0 ret=1 task_tick_fair: pelt_update: util_avg=363 load_avg=426 acc_load_avg=426 acc_util_avg=363 util_err=0 load_err=0 load_sum=20029072 sum_err=0 delta_us=977 cfs_rq=1 ret=1 task_tick_fair: pelt_update: util_avg=373 load_avg=377 acc_load_avg=377 acc_util_avg=373 util_err=0 load_err=0 load_sum=17656717 sum_err=0 delta_us=978 cfs_rq=0 ret=1 task_tick_fair: pelt_update: util_avg=376 load_avg=438 acc_load_avg=438 acc_util_avg=376 util_err=0 load_err=0 load_sum=20584978 sum_err=0 delta_us=978 cfs_rq=1 ret=1 task_tick_fair: pelt_update: util_avg=373 load_avg=377 acc_load_avg=390 acc_util_avg=386 util_err=13 load_err=13 load_sum=18651021 sum_err=-994304 delta_us=971 cfs_rq=0 ret=0 task_tick_fair: pelt_update: util_avg=389 load_avg=450 acc_load_avg=450 acc_util_avg=389 util_err=0 load_err=0 load_sum=21120780 sum_err=0 delta_us=971 cfs_rq=1 ret=1 dequeue_task_fair: pelt_update: util_avg=396 load_avg=400 acc_load_avg=400 acc_util_avg=396 util_err=0 load_err=0 load_sum=18987624 sum_err=0 delta_us=720 cfs_rq=0 ret=1 I hope to finish the perf tests today that Peter discussed and provide an update, thanks! -Joel >> In this you will see rq util_avg change as follows: 165 -> 182 -> 182 >> (missed an update causing error). This is also reflected in the graph >> in the graph where you see the flat green line. >> >>> >>> That would be interesting to also display the last_update_time of sched_avg >>> >>>> >>>>> commit 625ed2bf049d "sched/cfs: Make util/load_avg more stable" fixes >>>>> this glitch. >>>>> And the lower peak value in page 3 is probably linked to the inaccuracy >>>> >>>> This is not true. The reduction in peak in my tests which happen even >>>> after your patch is because of the dequeue that happens just before >>>> the period boundary is hit. Could you please take a look at the >>>> updated document in the link above? In there I show in the second >>>> example with a trace that corresponds the reduction in peak during the >>>> dequeue and is because of the delay in update. These errors go away >>>> with my patch. >>> >>> There is the same strange behavior there: >>> When the reduction in peak happens, the util_avg is updated whereas >>> your concerns is that util_avg is not update often enough. >>> At timestamp 10.656683, we have util_avg=389 acc_util_avg=389 util_err=0 >>> At timestamp 10.657420, we have util_avg=396 acc_util_avg=396 >>> util_err=0. I don't see this point on the graph >>> At timestamp 10.657422, we have util_avg=389 acc_util_avg=399 >>> util_err=10. This is the colored faulty trace but util_avg has been >>> updated from 369 to 389 >> >> Yeah, same thing here, you should look at the lines with cfs_rq == 1. >> The util changes as: 363 -> 376 -> 389 -> 389 (missed update). >> >> >> thanks, >> >> -Joel >> >> >>> >>> Regards, >>> Vincent >>> >>>> >>>>> I agree that there is an inaccuracy (the max absolute value of 22) but >>>>> that's in favor of less overhead. Have you seen wrong behavior because >>>>> of this inaccuracy ? >>>> >>>> I haven't tried to nail this to a wrong behavior however since other >>>> patches have been posted to fix inaccuracy and I do see we reach the >>>> theoretical maximum error on quite a few occassions, I think its >>>> justifiable. Also the overhead is minimal if updates aren't happening >>>> several times in a window, and at 128us interval, and the few times >>>> that the update does happen, the division is performed only during >>>> those times. So incases where it does fix the error, it does so with >>>> minimal overhead. I do agree with the overhead point and I'm planning >>>> to do more tests with hackbench to confirm overhead is minimal. I'll >>>> post some updates about it soon. >>>> >>>> Thanks! >>>> >>>> -Joel >>>> >>>> >>>>> >>>>>> >>>>>> With the patch, the error in the signal is significantly reduced, and is >>>>>> non-existent beyond a small negligible amount. >>>>>> >>>>>> Cc: Vincent Guittot >>>>>> Cc: Peter Zijlstra >>>>>> Cc: Juri Lelli >>>>>> Cc: Brendan Jackman >>>>>> Cc: Dietmar Eggeman >>>>>> Signed-off-by: Joel Fernandes >>>>>> --- >>>>>> kernel/sched/fair.c | 8 ++++++-- >>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >>>>>> index 4f1825d60937..1347643737f3 100644 >>>>>> --- a/kernel/sched/fair.c >>>>>> +++ b/kernel/sched/fair.c >>>>>> @@ -2882,6 +2882,7 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, >>>>>> unsigned long weight, int running, struct cfs_rq *cfs_rq) >>>>>> { >>>>>> u64 delta; >>>>>> + int periods; >>>>>> >>>>>> delta = now - sa->last_update_time; >>>>>> /* >>>>>> @@ -2908,9 +2909,12 @@ ___update_load_avg(u64 now, int cpu, struct sched_avg *sa, >>>>>> * accrues by two steps: >>>>>> * >>>>>> * Step 1: accumulate *_sum since last_update_time. If we haven't >>>>>> - * crossed period boundaries, finish. >>>>>> + * crossed period boundaries and the time since last update is small >>>>>> + * enough, we're done. >>>>>> */ >>>>>> - if (!accumulate_sum(delta, cpu, sa, weight, running, cfs_rq)) >>>>>> + periods = accumulate_sum(delta, cpu, sa, weight, running, cfs_rq); >>>>>> + >>>>>> + if (!periods && delta < 128) >>>>>> return 0; >>>>>> >>>>>> /* >>>>>> -- >>>>>> 2.14.0.rc1.383.gd1ce394fe2-goog >>>>>> > > -- > You received this message because you are subscribed to the Google Groups "kernel-team" group. > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >