Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0049C433F5 for ; Tue, 4 Jan 2022 11:47:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232590AbiADLrP (ORCPT ); Tue, 4 Jan 2022 06:47:15 -0500 Received: from foss.arm.com ([217.140.110.172]:58590 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232573AbiADLrN (ORCPT ); Tue, 4 Jan 2022 06:47:13 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 81A961396; Tue, 4 Jan 2022 03:47:13 -0800 (PST) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 94F453F774; Tue, 4 Jan 2022 03:47:11 -0800 (PST) Subject: Re: [PATCH v2 1/3] sched/pelt: Don't sync hardly util_sum with uti_avg To: Vincent Guittot , mingo@redhat.com, peterz@infradead.org, juri.lelli@redhat.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, linux-kernel@vger.kernel.org, rickyiu@google.com, odin@uged.al Cc: sachinp@linux.vnet.ibm.com, naresh.kamboju@linaro.org References: <20211222093802.22357-1-vincent.guittot@linaro.org> <20211222093802.22357-2-vincent.guittot@linaro.org> From: Dietmar Eggemann Message-ID: <9e526482-905c-e759-8aa6-1ff84bb5b2a3@arm.com> Date: Tue, 4 Jan 2022 12:47:10 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20211222093802.22357-2-vincent.guittot@linaro.org> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 22/12/2021 10:38, Vincent Guittot wrote: s/util_sum with uti_avg/util_sum with util_avg [...] > +#define MIN_DIVIDER (LOAD_AVG_MAX - 1024) Shouldn't this be in pelt.h? [...] > @@ -3466,13 +3466,30 @@ update_tg_cfs_util(struct cfs_rq *cfs_rq, struct sched_entity *se, struct cfs_rq > */ > divider = get_pelt_divider(&cfs_rq->avg); > > + > /* Set new sched_entity's utilization */ > se->avg.util_avg = gcfs_rq->avg.util_avg; > - se->avg.util_sum = se->avg.util_avg * divider; > + new_sum = se->avg.util_avg * divider; > + delta_sum = (long)new_sum - (long)se->avg.util_sum; > + se->avg.util_sum = new_sum; > > /* Update parent cfs_rq utilization */ > - add_positive(&cfs_rq->avg.util_avg, delta); > - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider; > + add_positive(&cfs_rq->avg.util_avg, delta_avg); > + add_positive(&cfs_rq->avg.util_sum, delta_sum); > + > + /* > + * Because of rounding, se->util_sum might ends up being +1 more than > + * cfs->util_sum (XXX fix the rounding). Although this is not > + * a problem by itself, detaching a lot of tasks with the rounding > + * problem between 2 updates of util_avg (~1ms) can make cfs->util_sum > + * becoming null whereas cfs_util_avg is not. > + * Check that util_sum is still above its lower bound for the new > + * util_avg. Given that period_contrib might have moved since the last > + * sync, we are only sure that util_sum must be above or equal to > + * util_avg * minimum possible divider > + */ > + cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum, > + cfs_rq->avg.util_avg * MIN_DIVIDER); > } > I still wonder whether the regression only comes from the changes in update_cfs_rq_load_avg(), introduced by 1c35b07e6d39. And could be fixed only by this part of the patch-set (A): @@ -3677,15 +3706,22 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) r = removed_load; sub_positive(&sa->load_avg, r); - sa->load_sum = sa->load_avg * divider; + sub_positive(&sa->load_sum, r * divider); + sa->load_sum = max_t(u32, sa->load_sum, sa->load_avg * MIN_DIVIDER); r = removed_util; sub_positive(&sa->util_avg, r); - sa->util_sum = sa->util_avg * divider; + sub_positive(&sa->util_sum, r * divider); + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER); r = removed_runnable; sub_positive(&sa->runnable_avg, r); - sa->runnable_sum = sa->runnable_avg * divider; + sub_positive(&sa->runnable_sum, r * divider); + sa->runnable_sum = max_t(u32, sa->runnable_sum, + sa->runnable_avg * MIN_DIVIDER); i.e. w/o changing update_tg_cfs_X() (and detach_entity_load_avg()/dequeue_load_avg()). update_load_avg() update_cfs_rq_load_avg() <--- propagate_entity_load_avg() update_tg_cfs_X() <--- I didn't see the SCHED_WARN_ON() [cfs_rq_is_decayed()] when looping on hackbench in several different sched group levels on [Hikey620 (Arm64, 8 CPUs, SMP, 4 taskgroups: A/B C/D E/F G/H), >12h uptime]. Rick is probably in a position to test whether this would be sufficient to cure the CPU frequency regression. I can see that you want to use the same _avg/_sum sync in detach_entity_load_avg()/dequeue_load_avg() as in update_cfs_rq_load_avg(). (B) And finally in update_tg_cfs_X() as well plus down-propagating _sum independently from _avg. (C) So rather splitting the patchset into X (util, runnable, load) the whole change might be easier to handle IMHO when split into (A), (B), (C) (obviously only in case (A) cures the regression). > static inline void > @@ -3681,7 +3698,9 @@ update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq) > > r = removed_util; > sub_positive(&sa->util_avg, r); > - sa->util_sum = sa->util_avg * divider; > + sub_positive(&sa->util_sum, r * divider); > + /* See update_tg_cfs_util() */ > + sa->util_sum = max_t(u32, sa->util_sum, sa->util_avg * MIN_DIVIDER); > > r = removed_runnable; > sub_positive(&sa->runnable_avg, r); > @@ -3780,7 +3799,11 @@ static void detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s > > dequeue_load_avg(cfs_rq, se); > sub_positive(&cfs_rq->avg.util_avg, se->avg.util_avg); > - cfs_rq->avg.util_sum = cfs_rq->avg.util_avg * divider; > + sub_positive(&cfs_rq->avg.util_sum, se->avg.util_sum); > + /* See update_tg_cfs_util() */ > + cfs_rq->avg.util_sum = max_t(u32, cfs_rq->avg.util_sum, > + cfs_rq->avg.util_avg * MIN_DIVIDER); > + Maybe add a: Fixes: fcf6631f3736 ("sched/pelt: Ensure that *_sum is always synced with *_avg") [...] This max_t() should make sure that `_sum is always >= _avg * MIN_DIVIDER`. Which is not the case sometimes. Currently this is done in (1) update_cfs_rq_load_avg() (2) detach_entity_load_avg() and dequeue_load_avg() (3) update_tg_cfs_X() but not in attach_entity_load_avg(), enqueue_load_avg(). What's the reason for this?