Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752879AbbLNMce (ORCPT ); Mon, 14 Dec 2015 07:32:34 -0500 Received: from foss.arm.com ([217.140.101.70]:42646 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752132AbbLNMcd (ORCPT ); Mon, 14 Dec 2015 07:32:33 -0500 Date: Mon, 14 Dec 2015 12:32:54 +0000 From: Morten Rasmussen To: bsegall@google.com Cc: Dietmar Eggemann , Andrey Ryabinin , Peter Zijlstra , mingo@redhat.com, linux-kernel@vger.kernel.org, yuyang.du@intel.com, Paul Turner Subject: Re: [PATCH] sched/fair: fix mul overflow on 32-bit systems Message-ID: <20151214123253.GA9870@e105550-lin.cambridge.arm.com> References: <1449838518-26543-1-git-send-email-aryabinin@virtuozzo.com> <20151211132551.GO6356@twins.programming.kicks-ass.net> <20151211133612.GG6373@twins.programming.kicks-ass.net> <566AD6E1.2070005@virtuozzo.com> <20151211175751.GA27552@e105550-lin.cambridge.arm.com> <566B16D8.2060109@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4944 Lines: 91 On Fri, Dec 11, 2015 at 11:18:56AM -0800, bsegall@google.com wrote: > Dietmar Eggemann writes: > > IMHO, on 32bit machine we can deal with (2147483648/47742/1024 = 43.9) > > 43 tasks before overflowing. > > > > Can we have a scenario where >43 tasks with se->avg.util_avg=1024 value > > get migrated (migrate_task_rq_fair()) or die (task_dead_fair()) or a > > task group dies (free_fair_sched_group()) which has a se->avg.util_avg > > > 44981 for a specific cpu before the atomic_long_xchg() happens in > > update_cfs_rq_load_avg()? Never saw this in my tests so far on ARM > > machines. > > First, I believe in theory util_avg on a cpu should add up to 100% or > 1024 or whatever. However, recently migrated-in tasks don't have their > utilization cleared, so if they were quickly migrated again you could > have up to the number of cpus or so times 100%, which could lead to > overflow here. Not only that, just creating new tasks can the overflow. As Yuyang already pointed out in this thread, tasks are initialized to 100% so spawning n_cpus*44 should almost guarantee overflow for at least one rq in the system. > This just leads to more questions though: > > The whole removed_util_avg thing doesn't seem to make a ton of sense - > the code doesn't add util_avg for a migrating task onto > cfs_rq->avg.util_avg, and doing so would regularly give >100% values (it > does so on attach/detach where it's less likely to cause issues, but not > migration). Removing it only makes sense if the task has accumulated all > that utilization on this cpu, and even then mostly only makes sense if > this is the only task on the cpu (and then it would make sense to add it > on migrate-enqueue). The whole add-on-enqueue-migrate, > remove-on-dequeue-migrate thing comes from /load/, where doing so is a > more globally applicable approximation than it is for utilization, > though it could still be useful as a fast-start/fast-stop approximation, > if the add-on-enqueue part was added. It could also I guess be cleared > on migrate-in, as basically the opposite assumption (or do something > like add on enqueue, up to 100% and then set the se utilization to the > amount actually added or something). Migrated tasks are already added to cfs_rq->avg.util_avg (as Yuyang already pointed out) which gives us very responsive metric for cpu utilization. util_avg > 100% is currently a quite common transient scenario. It happens very often when creating new tasks. Unless we always clear util_avg on migration (including wake-up migration) we will have to deal with util_avg > 100%, but clearing would make per-entity utilization tracking useless. The whole point, as I see it, is to have a utilization metric which can deal with task migrations. We do however have to be very clear about the meaning of util_avg. It has very little meaning when for neither the sched_entites nor the cfs_rq when cfs_rq->avg.util_avg > 100%. All we can say is that the cpu is quite likely overutilized. But for lightly utilized systems it gives us a very responsive and fairly accurate estimate of the cpu utilization and can be used to estimate the cpu utilization change caused by migrating a task. > If the choice was to not do the add/remove thing, then se->avg.util_sum > would be unused except for attach/detach, which currently do the > add/remove thing. It's not unreasonable for them, except that currently > nothing uses anything other than the root's utilization, so migration > between cgroups wouldn't actually change the relevant util number > (except it could because changing the cfs_rq util_sum doesn't actually > do /anything/ unless it's the root, so you'd have to wait until the > cgroup se actually changed in utilization). We use util_avg extensively in the energy model RFC patches, and I think it is worth considering using both cfs_rq->avg.util_avg and se->avg.util_avg to improve select_task_rq_fair(). util_avg for task groups has a quite different meaning than load_avg. Where load_avg is scaled to ensure that the combined contribution of a group never exceeds that of a single always-running task, util_avg for groups reflect the true cpu utilization of the group. I agree that tracking util_avg for groups is redundant and could be removed if it can be done in a clean way. > So uh yeah, my initial impression is "rip it out", but if being > immediately-correct is important in the case of one task being most of > the utilization, rather than when it is more evenly distributed, it > would probably make more sense to instead put in the add-on-enqueue > code. I would prefer if stayed in. There are several patch sets posted for review that use util_avg. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/