Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp4450802pxj; Tue, 25 May 2021 08:17:04 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwejDsm+Jj7QXv37xq5RtniF12ILxpJeuXUyaaCiLfOyyXxmjxI5l+kxR0n9Lx42BBVV51d X-Received: by 2002:a17:906:4d13:: with SMTP id r19mr28869813eju.496.1621955823736; Tue, 25 May 2021 08:17:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621955823; cv=none; d=google.com; s=arc-20160816; b=uqodH5rSS7b3AE1psZkQIs6emlbHSid+3zZcEFHHsO+iJnNfdl4v/2bEhMgTfhuARo BNO0SS77vTGEojE2sIWBfO+7nL0Co3W3Z/r1sstqHeL0UBOBl8WNSO7dcyizeqR+eti7 Y0NkwaYUa3AeOYjUT4hQL4FMq6aVolfvBTi74nMG+Y/G/se4wkMTWauDz1K+Ca8mQN6h zF0ex6Y1K2GKo/i5QceL5Wg5Y5noyA+wOSa1L4/gDwoz60gHplxfrfZqerWyTMk47+WH /StxBdI3HfVphunFjze3uc4uLdPsoK1SSWnFcH3Vehbl1KeQQstIMNPtiRVHb8SUDX71 qqUw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=acqkdVK8gb0pT106nPB+w/Kmxa8zfXAfs26vpTyFlNg=; b=GDYR3bAS29kexG34bwN+t+BGrYLJnA5V2KJT94ZrnkCwyuEmxbwh90e4COpT7k//I1 v9IlE5kOQCPyjtOmSbAyYPDrnBdF2vPfB9zWTGpmDKrhRDJfilDne2iIkv0R5im8Ox+p dT3YP4B78k55PFFSXoziKy0B2yhdYZBgqB3/UPIwf/Bvs68KjdSR8rvttTQ/2TPFFiJD 781TWsjV36oBcHrnRiCRKwvxjpTWEMMqg45dPW8qrYEVK3PWaVL3C2MpffGaX7RZbkiM /bPgTRLswF7ScP7drqerf73TzF0ReubE/iK/OaNwaNWMhR1BnRFm91h7yQon9a5Qofox qtqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=pQSq+GOL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id w1si2905537edc.221.2021.05.25.08.16.39; Tue, 25 May 2021 08:17:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=desiato.20200630 header.b=pQSq+GOL; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230381AbhEYKtM (ORCPT + 99 others); Tue, 25 May 2021 06:49:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230252AbhEYKtL (ORCPT ); Tue, 25 May 2021 06:49:11 -0400 Received: from desiato.infradead.org (desiato.infradead.org [IPv6:2001:8b0:10b:1:d65d:64ff:fe57:4e05]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EBECDC061574 for ; Tue, 25 May 2021 03:47:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=desiato.20200630; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=acqkdVK8gb0pT106nPB+w/Kmxa8zfXAfs26vpTyFlNg=; b=pQSq+GOLZogVi8DK6fJ/jEAfna 4JgNQgQ186aESOs6yJOocnoinrM8yeTNVwpVP/xBuKXtWdKlgXdZ3Neawb04ZKRwiTMzRZyIbzR9R SwX5KmSOdSeQKRbri1pk+muW7PZwh1UCyG8aeEIkgLq7HhJSzYWlDmd1GTpf4zPchXy4j/+BKdV84 FSHkkfekwlz7KeOpo1JcsiO3yMSI6y8fWSF5iMhSMTxwBlQs5p8PanThJbteLDfXJYMmUYByHqsU7 Kud8c8UuScfQoDibnGWmObkgYj3/n/I3mk7gK/tOoJDUuwO6LCp32kL7/BzWZKdaGjy7JtAK25BOD PB+7CunA==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by desiato.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1llUaJ-0002HQ-Fb; Tue, 25 May 2021 10:46:59 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 454003001E4; Tue, 25 May 2021 12:46:52 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id 212332029A1A5; Tue, 25 May 2021 12:46:52 +0200 (CEST) Date: Tue, 25 May 2021 12:46:52 +0200 From: Peter Zijlstra To: changhuaixin Cc: Benjamin Segall , Dietmar Eggemann , dtcccc@linux.alibaba.com, Juri Lelli , khlebnikov@yandex-team.ru, open list , Mel Gorman , Ingo Molnar , Odin Ugedal , Odin Ugedal , pauld@redhead.com, Paul Turner , Steven Rostedt , Shanpei Chen , Tejun Heo , Vincent Guittot , xiyou.wangcong@gmail.com, luca.abeni@santannapisa.it, tommaso.cucinotta@santannapisa.it, baruah@wustl.edu, anderson@cs.unc.edu Subject: Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller Message-ID: References: <20210520123419.8039-1-changhuaixin@linux.alibaba.com> <20210520123419.8039-2-changhuaixin@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 24, 2021 at 08:42:03PM +0800, changhuaixin wrote: > > On May 21, 2021, at 10:00 PM, Peter Zijlstra wrote: > > > > On Thu, May 20, 2021 at 08:34:17PM +0800, Huaixin Chang wrote: > >> The CFS bandwidth controller limits CPU requests of a task group to > >> quota during each period. However, parallel workloads might be bursty > >> so that they get throttled even when their average utilization is under > >> quota. And they are latency sensitive at the same time so that > >> throttling them is undesired. > >> > >> Scaling up period and quota allows greater burst capacity. But it might > >> cause longer stuck till next refill. Introduce "burst" to allow > >> accumulating unused quota from previous periods, and to be assigned when > >> a task group requests more CPU than quota during a specific period. > >> > >> Introducing burst buffer might also cause interference to other groups. > >> Thus limit the maximum accumulated buffer by "burst", and limit > >> the maximum allowed burst by quota, too. > > > > Overall, *much* better than before. > > > > However I would like a little bit better discussion of how exactly > > people are supposed to reason about this. That will also help with the > > question from Odin on how people are supposed to set/compute this burst > > value. > > > > So traditional (UP-EDF) bandwidth control is something like: > > > > (U = \Sum u_i) <= 1 > > > > This guaranteeds both that every deadline is met and that the system is > > stable. After all, if U were > 1, then for every second of walltime, > > we'd have to run more than a second of program time, and obviously miss > > our deadline, but the next deadline will be further out still, there is > > never time to catch up, unbounded fail. > > > > This work observes that a workload doesn't always executes the full > > quota; this enables one to describe u_i as a statistical distribution. > > > > For example, have u_i = {x,e}_i, where x is the p(95) and x+e p(100) > > (the traditional WCET). This effectively allows u to be smaller, > > increasing the efficiency (we can pack more tasks in the system), but at > > the cost of missing deadlines when all the odds line up. However, it > > does maintain stability, since every overrun must be paired with an > > underrun as long as our x is above the average. > > > > That is, suppose we have 2 tasks, both specify a p(95) value, then we > > have a p(95)*p(95) = 90.25% chance both tasks are within their quota and > > everything is good. At the same time we have a p(5)p(5) = 0.25% chance > > both tasks will exceed their quota at the same time (guaranteed deadline > > fail). Somewhere in between there's a threshold where one exceeds and > > the other doesn't underrun enough to compensate; this depends on the > > specific CDFs. > > > > At the same time, we can say that the worst case deadline miss, will be > > \Sum e_i; that is, there is a bounded tardiness (under the assumption > > that x+e is indeed WCET). Having second thoughts about this exact claim; lightning can strike twice, and if we exceed bounds again before having recovered from the last time we might exceed the bound mentioned. I _think_ the property holds, but the bound might need work. > > And I think you can compute more fun properties. > > > > Now, CFS bandwidth control is not EDF, and the above doesn't fully > > translate, but much does I think. > > > > We borrow time now against our future underrun, at the cost of increased > > interference against the other system users. All nicely bounded etc.. > > > > I shall improve the commit log then. Thanks! > We did some compute on the probability that deadline is missed, and the expected > boundary. These values are calculated with different number of control groups and > variable CPU utilization when runtime is under exponential distribution, poisson > distribution or pareto distribution. > > The more control groups there are, the more likely deadline is made and the smaller average > WCET to expect. Because many equal control groups means small chance of U > 1. > > And the more under utilized the whole system is, the more likely deadline is made and the smaller > average WCET to expect. > > More details are posted in > https://lore.kernel.org/lkml/5371BD36-55AE-4F71-B9D7-B86DC32E3D2B@linux.alibaba.com/. Indeed you did; I'm a bit sad it's so hard to find papers that cover this. When one Googles for 'Probabilistic WCET' there's a fair number of papers about using Extreme Value Theory to estimate the traditional WCET given measurement based input. Many from the excellent WCET track at ECRTS. The thing is, the last time I attended that conference (which appears to be almost 4 years ago :/), I'm sure I spoke to people about exactly the thing explored here. Albeit, at the time we discussed this as a SCHED_DEADLINE task model extension. Let me Cc a bunch of people that might know more..,