Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3677576rwb; Fri, 30 Sep 2022 07:00:01 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4BmIiLAonazjs4URvI3nvi5mhoE22KVYX0RYnP7WG4WznrzvG/ogSdHasLpec4H6OD5QUo X-Received: by 2002:a05:6402:50d4:b0:451:bf26:8c41 with SMTP id h20-20020a05640250d400b00451bf268c41mr7945832edb.219.1664546401345; Fri, 30 Sep 2022 07:00:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1664546401; cv=none; d=google.com; s=arc-20160816; b=lGdDKrnWTTtriyWPd+U3B6QZkGVpcCbIt+4qIkBPJXJU16oUxTCRntI8Za22vxJiob g5dLfW4BqfxYroTWsD0xR2YuvAXIXz3LnH6nszZ0SNbFndVYem7Xg3zE/knSpmZHsSJv g9Gvl6Qppv+NmPkg7sYNtnJ//gVi9PRlAN2YAUEoutyoZAjr7F/bpb98neOXYtyXbz6a x7NDiEFZT3HswWH4vmgIpCuzyMXKeDfWkeLKSj9TGtkrAHSFkXv+PsuzX5Gg31NZB9Ze zh7iR9XG049w6GAZRNhhyAqf88J1mTvdHwCG6XmbPrUGjt+1VyUMjAsKWlnomFBR17lc Nc0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=+BDh60VDAOwD9Hw1SGsYpGbPiDEy8IID0ICw2pjvEjM=; b=MC/w41arznqvzA9Sd79XgE7WA2YVuxHcR2vruxrqk/+W8Sj/NDffwY6LPfo40QODMk ofW/9+Mw5uILNHvQOC1Ko+yGQpFaYlrUH6ztHgfyMVChW3kbtw4WlzeW4CeYtzdyEihP JapVCvvpQAuQASNb73hZUeV1+a/POYQN2GFo8F33t/NSMJNEo5C8y+3nuMf0UOcbu4nN e85mWeUN87AWwTZSXhGHkgaoAoPKcu7c3IdGTrbfY94xX2Pmr+6X/j1qTaGGus34xoBy YV2R9CF6tpzJlQVTHJWiRZHC7fueTffb7rHCEGPdRA/dO7cAkd+TTbAQD8uRB1jQaMXt /aUQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i10-20020a1709064fca00b006fe95bb93b3si2075290ejw.861.2022.09.30.06.59.34; Fri, 30 Sep 2022 07:00:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231442AbiI3Ntj (ORCPT + 99 others); Fri, 30 Sep 2022 09:49:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38680 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231129AbiI3Nth (ORCPT ); Fri, 30 Sep 2022 09:49:37 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id D6783E7B for ; Fri, 30 Sep 2022 06:49:34 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 29B7813D5; Fri, 30 Sep 2022 06:49:41 -0700 (PDT) Received: from wubuntu (unknown [10.57.34.152]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 392B13F792; Fri, 30 Sep 2022 06:49:33 -0700 (PDT) Date: Fri, 30 Sep 2022 14:49:31 +0100 From: Qais Yousef To: Joel Fernandes Cc: Peter Zijlstra , LKML , Steven Rostedt , juri.lelli@redhat.com, vincent.guittot@linaro.org, Youssef Esmat , Dietmar Eggemann , Thomas Gleixner Subject: Re: Sum of weights idea for CFS PI Message-ID: <20220930134931.mpopdvri4xuponw2@wubuntu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Joel I'm interested in the topic, if I can be CCed in any future discussions I'd appreciate it :) On 09/29/22 16:38, Joel Fernandes wrote: > Hi Peter, all, > > Just following-up about the idea Peter suggested at LPC22 about sum of weights > to solve the CFS priority inversion issues using priority inheritance. I am not > sure if a straight forward summation of the weights of dependencies in the > chain, is sufficient (or may cause too much unfairness). > > I think it will work if all the tasks on CPU are 100% in utilization: > > Say if you have 4 tasks (A, B, C, D) running and each one has equal > weight (W) except for A which has twice the weight (2W). > So the CPU bandwidth distribution is (assuming all are running): > A: 2 / 5 > B, C. D: 1 / 5 > > Say out of the 4 tasks, 3 of them are a part of a classical priority > inversion scenario (A, B and C). > > Say now A blocks on a lock and that lock's owner C is running, however now > because A has blocked, B gets 1/3 bandwidth, where as it should have been > limited to 1/5. To remedy this, say you give C a weight of 2W. B gets 1/4 > bandwidth - still not fair since B is eating away CPU bandwidth causing the > priority inversion we want to remedy. > > The correct bandwidth distribution should be (B and D should be unchanged): > B = 1/5 > D = 1/5 > > C = 3/5 > > This means that C's weight should be 3W , and B and D should be W each > as before. So indeed, C's new weight is its original weight PLUS the > weight of the A - that's needed to keep the CPU usage of the other > tasks (B, D) in check so that C makes forward progress on behalf of A and the > other tasks don't eat into the CPU utilization. > > However, I think this will kinda fall apart if A is asleep 50% of the time > (assume the sleep is because of I/O and unrelated to the PI chain). > > Because now if all were running (and assume no PI dependencies), with A being > 50%, the bandwidth of B, C and D each would be divided into 2 components: > > a. when A is running, it would be as above. > b. but if A was sleeping, B, C, and D would get 1/3. > > So on average, B, C and D get: (1/3 + 1/5) / 2 = 8/30. This gives A about 6/30 > or 1/5 bandwidth. The average metric is interesting one. It can be confusing to reason about too. I think we have 3 events to take into account here, not 2: a. when A is running and NOT blocked on C. b. when A is running and BLOCKED on C. c. A is sleeping. This means A, B, C and D's shares will be: A , B , C , D a. 2/5, 1/5, 1/5, 1/5 b. - , 3/5, 1/5, 1/5 c. - , 1/3, 1/3, 1/3 Since A is sleeping for 50%, I don't think we can assume equal distribution for the 3 events (can't just divide by 3). I believe we can assume that a. occurs 25% of the time b. occurs 25% of the time c. occurs 50% of the time I *think* this should provide something more representative. > > But now say A happen to block on a lock that C is holding. You would boost C to > weight 3W which gives it 3/5 (or 18/30) as we saw above, which is more than what > C should actually get. > > C should get (8/30 + 6/30 = 14/30) AFAICS. > > Hopefully one can see that a straight summation of weights is not enough. It > needs to be something like: > > C's new weight = C's original weight + (A's weight) * (A's utilization) > > Or something, otherwise the inherited weight may be too much to properly solve it. > > Any thoughts on this? You mentioned you had some notes on this and/or proxy > execution, could you share it? I assume we'll be using rt-mutex inheritance property to handle this? If this was discussed during a talk, I'd appreciate a link to that. In the past in OSPM conference we brought up an issue with performance inversion where a task running on a smaller (slower to be more generic) CPU is holding the lock and causing massive delays for waiters. This is an artefact of DVFS. For HMP, there's an additional cause due to the unequal capacities of the CPUs. Proxy execution seems to be the nice solution to all of these problems, but it's a long way away. I'm interested to learn how this inheritance will be implemented. And whether there are any userspace conversion issues. i.e: do we need to convert all locks to rt-mutex locks? Thanks -- Qais Yousef