Received: by 2002:a05:6358:11c7:b0:104:8066:f915 with SMTP id i7csp1136409rwl; Wed, 5 Apr 2023 12:19:07 -0700 (PDT) X-Google-Smtp-Source: AKy350bsSh2Lb4HrLuntuhc28a3hUa7dhWD1mRygnqwtXKEGXG/HClLONMZkJLolO+5B3t9OmbJG X-Received: by 2002:a17:90b:4a8e:b0:23f:7e2e:fe13 with SMTP id lp14-20020a17090b4a8e00b0023f7e2efe13mr8156757pjb.0.1680722347306; Wed, 05 Apr 2023 12:19:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680722347; cv=none; d=google.com; s=arc-20160816; b=yzpdYn3EUR2iIciZGQhGnDBWry+ZIwKeww7/pBj+mDRLIdqjwl59Pt/u3X9zPZjcV2 fM4aVueVz27yshADwSSgFu5rbhqJRquP8ZxqpyKbFCyC+qWzMwHNK0PhzHvNwRrDCPhw 49Cr8KyFfgmG7gtVR7d8C7LvbcAsrIIpsGG9Wyr1BK7+U/ezubQK8Qkio+ofFB6mvQvi yPGzMcSqJGCWCmIExlMk/ANjkSPr9E82W2aZtID8qchRa9wF57R0OXF1fjwa53qJAH0Z QEJS5AYuhusj3fkC5iX3W68V+40fHfYbtPoA85MZhwZXxBx9fMbyImS4CCBj1DzWsvH9 IxVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=63oi9q+4waqzZKDqYVHTFtlgzhGFiHzIE9Pkom2KQUw=; b=CBp8oM1rKJtWjFPg7iKVJ63wUGZOmpSjz0m2rM/H2FbVu9LNiEWKyiLsT/M8WCYAAC HrS1FTuz6qVehCe80fhafjwaYARUqgTVQHZFd5h4vhvc/KTfhRpeEuBduSUXQUUVrQyI DowIWNoBCvXzZG04PWpxD1nnP3w2cMhPuPI+UNyWLN5RMErTqU22iIyYmoqLVDuAHhDH 2Eja9nf2RTDshZGD6C9G9xHyux8y2BPlMhm+nxbiAFrKzgQX9zD3JNEPp7qa7QnPghX8 fFVpYFuCux/XCSOvP+pov5542sWwL4Cahd4Nk/4iWZlGbj2oUYmhlt44cUPlIqGJhZiX PsvA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=PI+v6YGi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p7-20020a17090a868700b0023f043385bcsi1955465pjn.111.2023.04.05.12.18.55; Wed, 05 Apr 2023 12:19:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=PI+v6YGi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234066AbjDETOp (ORCPT + 99 others); Wed, 5 Apr 2023 15:14:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232079AbjDETOo (ORCPT ); Wed, 5 Apr 2023 15:14:44 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 03B5FC3 for ; Wed, 5 Apr 2023 12:14:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Transfer-Encoding: Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-ID:Content-Description; bh=63oi9q+4waqzZKDqYVHTFtlgzhGFiHzIE9Pkom2KQUw=; b=PI+v6YGiuBlzKMa6KXYuYVgLye S6zSNAAJwhDW6FGVob0BaL1zJ3d1I/Je8YtE+YySZIZ+hdUO9nHj2J8iv/SCZ936De7dApCCmX6Kw 3f30AuWiXy8mmT9k3IUklyH901TFaUdwIPtp6lqm6JhDg6w5SwsZUl3QU37eVEw0+hnIga88O+rnN Eqq4yjheXlI+6g1Q6hcrW40OEwNcmut2jAioE0k2jILDdu4o75mi0cIWYIs6eYgHSWhHyryGee9f9 fHMxdVl/QioLn1WAbsF4tAaH7stSTUIDp4zks+CSPbW5MRku3BIN1gQbOEwAtvEGkNg7cJS4XHwdW KEkjKQJw==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.96 #2 (Red Hat Linux)) id 1pk8Zb-00A3c5-1m; Wed, 05 Apr 2023 19:13:35 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 780D2300202; Wed, 5 Apr 2023 21:13:30 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 5BA8626378075; Wed, 5 Apr 2023 21:13:30 +0200 (CEST) Date: Wed, 5 Apr 2023 21:13:30 +0200 From: Peter Zijlstra To: Josh Don Cc: mingo@kernel.org, vincent.guittot@linaro.org, linux-kernel@vger.kernel.org, juri.lelli@redhat.com, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com, patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz, qperret@google.com, tim.c.chen@linux.intel.com, timj@gnu.org, kprateek.nayak@amd.com, yu.c.chen@intel.com, youssefesmat@chromium.org, joel@joelfernandes.org, efault@gmx.de Subject: Re: [PATCH 04/17] sched/fair: Add avg_vruntime Message-ID: <20230405191330.GA365932@hirez.programming.kicks-ass.net> References: <20230328092622.062917921@infradead.org> <20230328110353.853385546@infradead.org> <20230329075051.GJ4253@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20230329075051.GJ4253@hirez.programming.kicks-ass.net> X-Spam-Status: No, score=-2.5 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 29, 2023 at 09:50:51AM +0200, Peter Zijlstra wrote: > On Tue, Mar 28, 2023 at 04:57:49PM -0700, Josh Don wrote: > > On Tue, Mar 28, 2023 at 4:06 AM Peter Zijlstra wrote: > > [...] > > > +/* > > > + * Compute virtual time from the per-task service numbers: > > > + * > > > + * Fair schedulers conserve lag: \Sum lag_i = 0 > > > + * > > > + * lag_i = S - s_i = w_i * (V - v_i) > > > + * > > > + * \Sum lag_i = 0 -> \Sum w_i * (V - v_i) = V * \Sum w_i - \Sum w_i * v_i = 0 > > > > Small note: I think it would be helpful to label these symbols > > somewhere :) Weight and vruntime are fairly obvious, but I don't > > think 'S' and 'V' are as clear. Are these non-virtual ideal service > > time, and average vruntime, respectively? > > Yep, they are. I'll see what I can do with the comments. /* * Compute virtual time from the per-task service numbers: * * Fair schedulers conserve lag: * * \Sum lag_i = 0 * * Where lag_i is given by: * * lag_i = S - s_i = w_i * (V - v_i) * * Where S is the ideal service time and V is it's virtual time counterpart. * Therefore: * * \Sum lag_i = 0 * \Sum w_i * (V - v_i) = 0 * \Sum w_i * V - w_i * v_i = 0 * * From which we can solve an expression for V in v_i (which we have in * se->vruntime): * * \Sum v_i * w_i \Sum v_i * w_i * V = -------------- = -------------- * \Sum w_i W * * Specifically, this is the weighted average of all entity virtual runtimes. * * [[ NOTE: this is only equal to the ideal scheduler under the condition * that join/leave operations happen at lag_i = 0, otherwise the * virtual time has non-continguous motion equivalent to: * * V +-= lag_i / W * * Also see the comment in place_entity() that deals with this. ]] * * However, since v_i is u64, and the multiplcation could easily overflow * transform it into a relative form that uses smaller quantities: * * Substitute: v_i == (v_i - v0) + v0 * * \Sum ((v_i - v0) + v0) * w_i \Sum (v_i - v0) * w_i * V = ---------------------------- = --------------------- + v0 * W W * * Which we track using: * * v0 := cfs_rq->min_vruntime * \Sum (v_i - v0) * w_i := cfs_rq->avg_vruntime * \Sum w_i := cfs_rq->avg_load * * Since min_vruntime is a monotonic increasing variable that closely tracks * the per-task service, these deltas: (v_i - v), will be in the order of the * maximal (virtual) lag induced in the system due to quantisation. * * Also, we use scale_load_down() to reduce the size. * * As measured, the max (key * weight) value was ~44 bits for a kernel build. */ And the comment in place_entity() (slightly updated since this morning): /* * If we want to place a task and preserve lag, we have to * consider the effect of the new entity on the weighted * average and compensate for this, otherwise lag can quickly * evaporate. * * Lag is defined as: * * lag_i = S - s_i = w_i * (V - v_i) * * To avoid the 'w_i' term all over the place, we only track * the virtual lag: * * vl_i = V - v_i <=> v_i = V - vl_i * * And we take V to be the weighted average of all v: * * V = (\Sum w_j*v_j) / W * * Where W is: \Sum w_j * * Then, the weighted average after adding an entity with lag * vl_i is given by: * * V' = (\Sum w_j*v_j + w_i*v_i) / (W + w_i) * = (W*V + w_i*(V - vl_i)) / (W + w_i) * = (W*V + w_i*V - w_i*vl_i) / (W + w_i) * = (V*(W + w_i) - w_i*l) / (W + w_i) * = V - w_i*vl_i / (W + w_i) * * And the actual lag after adding an entity with vl_i is: * * vl'_i = V' - v_i * = V - w_i*vl_i / (W + w_i) - (V - vl_i) * = vl_i - w_i*vl_i / (W + w_i) * * Which is strictly less than vl_i. So in order to preserve lag * we should inflate the lag before placement such that the * effective lag after placement comes out right. * * As such, invert the above relation for vl'_i to get the vl_i * we need to use such that the lag after placement is the lag * we computed before dequeue. * * vl'_i = vl_i - w_i*vl_i / (W + w_i) * = ((W + w_i)*vl_i - w_i*vl_i) / (W + w_i) * * (W + w_i)*vl'_i = (W + w_i)*vl_i - w_i*vl_i * = W*vl_i * * vl_i = (W + w_i)*vl'_i / W */