Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp1863289pxf; Fri, 19 Mar 2021 19:06:47 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy4Wyj7ZFSEt4RHOR7Ye6yafg3paIl2f4QX2Rlxqc3yhY+6lOG3jiSthIZ4zUe2ZYhRW9KY X-Received: by 2002:a17:906:d9c9:: with SMTP id qk9mr7443573ejb.504.1616206007276; Fri, 19 Mar 2021 19:06:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616206007; cv=none; d=google.com; s=arc-20160816; b=KPrshPKvrYOzlmls7TejPsUDELr0M1QmLaGIqUNwQsJYsdZEG9XltrbGHGOIuKnNnP JOx9P4zO4rLQcu5dASLyrr4j2AMSAE/7ulz7+vPZJ9195nLtnrWypBdE1AiKqf3cs/w8 NBsFgaqCZF7yzU3qaJtaMsr6Rqdjx0jnFg+EfLxQGTsgqsfI1wdfvXRzRj0IlgxwgWPy qJhi8CY0rYkeyGlNnfsAzuIv1iJ7vVqRV5Qs8YuofvRaYZYrwzaTnqSfI4HfzOf+OlHn 4dYNO1wuhxSvBA48EMv7TcxN3fFlpMiCUwXDAJseRhvGNV830gqjTDRsXQag/R3ZWUYS +DaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=Zx3je2f1KXJ5RBbSqhvoq/AQOqIK8H484ZIJPVxOoKg=; b=VkyBqcWIChcP5NDXe+xGIYL69f+E13P9CwE+wYBMsNHX93XaIbGQ0giqnOXZRnE/kt XPKSKxxj1Ti7vQfhxl7S9crjDc6FfxKYso16glHu0l0vxvDo46E2lzu3xG7f3ALGiUcz zK0M95KOVzdyDCesxH9lgrbhW5RfO5k+tgvkfdJBErKHBvmDO4wCWcciMU2kElX1Ql0O CAw8QVDS/0nl7Y6FNBIec62T0QNeu6eoWm8OpAU6CMDyP4FGRdcIrrL4UYl7qtguzz/u Ds0pWZtx+aRp1Ub/FigGsajHiyf+s2/q9tivJMPrGuvzAzY3QEbqsqlPt5BionGjVEpW /4Og== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n24si5393884eds.571.2021.03.19.19.06.24; Fri, 19 Mar 2021 19:06:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229712AbhCTCFY convert rfc822-to-8bit (ORCPT + 99 others); Fri, 19 Mar 2021 22:05:24 -0400 Received: from out4436.biz.mail.alibaba.com ([47.88.44.36]:33935 "EHLO out4436.biz.mail.alibaba.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229447AbhCTCEw (ORCPT ); Fri, 19 Mar 2021 22:04:52 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R211e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04426;MF=changhuaixin@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0USeTkdM_1616205878; Received: from 192.168.3.154(mailfrom:changhuaixin@linux.alibaba.com fp:SMTPD_---0USeTkdM_1616205878) by smtp.aliyun-inc.com(127.0.0.1); Sat, 20 Mar 2021 10:04:39 +0800 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth burst From: changhuaixin In-Reply-To: <2F207CE6-F849-457A-B0A6-3A8BFFE0AFFB@linux.alibaba.com> Date: Sat, 20 Mar 2021 10:06:52 +0800 Cc: changhuaixin , Benjamin Segall , dietmar.eggemann@arm.com, juri.lelli@redhat.com, khlebnikov@yandex-team.ru, open list , mgorman@suse.de, mingo@redhat.com, Odin Ugedal , Odin Ugedal , pauld@redhead.com, Paul Turner , rostedt@goodmis.org, Shanpei Chen , Tejun Heo , Vincent Guittot , xiyou.wangcong@gmail.com Content-Transfer-Encoding: 8BIT Message-Id: References: <20210316044931.39733-1-changhuaixin@linux.alibaba.com> <20210316044931.39733-2-changhuaixin@linux.alibaba.com> <2F207CE6-F849-457A-B0A6-3A8BFFE0AFFB@linux.alibaba.com> To: Peter Zijlstra X-Mailer: Apple Mail (2.3445.104.11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 19, 2021, at 8:39 PM, changhuaixin wrote: > > > >> On Mar 18, 2021, at 11:05 PM, Peter Zijlstra wrote: >> >> On Thu, Mar 18, 2021 at 09:26:58AM +0800, changhuaixin wrote: >>>> On Mar 17, 2021, at 4:06 PM, Peter Zijlstra wrote: >> >>>> So what is the typical avg,stdev,max and mode for the workloads where you find >>>> you need this? >>>> >>>> I would really like to put a limit on the burst. IMO a workload that has >>>> a burst many times longer than the quota is plain broken. >>> >>> I see. Then the problem comes down to how large the limit on burst shall be. >>> >>> I have sampled the CPU usage of a bursty container in 100ms periods. The statistics are: >> >> So CPU usage isn't exactly what is required, job execution time is what >> you're after. Assuming there is a relation... >> > > Yes, job execution time is important. To be specific, it is to improve the CPU usage of the whole > system to reduce the total cost of ownership, while not damaging job execution time. This > requires lower the average CPU resource of underutilized cgroups, and allowing their bursts > at the same time. > >>> average : 42.2% >>> stddev : 81.5% >>> max : 844.5% >>> P95 : 183.3% >>> P99 : 437.0% >> >> Then your WCET is 844% of 100ms ? , which is .84s. >> >> But you forgot your mode; what is the most common duration, given P95 is >> so high, I doubt that avg is representative of the most common duration. >> > > It is true. > >>> If quota is 100000ms, burst buffer needs to be 8 times more in order >>> for this workload not to be throttled. >> >> Where does that 100s come from? And an 800s burst is bizarre. >> >> Did you typo [us] as [ms] ? >> > > Sorry, it should be 100000us. > >>> I can't say this is typical, but these workloads exist. On a machine >>> running Kubernetes containers, where there is often room for such >>> burst and the interference is hard to notice, users would prefer >>> allowing such burst to being throttled occasionally. >> >> Users also want ponies. I've no idea what kubernetes actually is or what >> it has to do with containers. That's all just word salad. >> >>> In this sense, I suggest limit burst buffer to 16 times of quota or >>> around. That should be enough for users to improve tail latency caused >>> by throttling. And users might choose a smaller one or even none, if >>> the interference is unacceptable. What do you think? >> >> Well, normal RT theory would suggest you pick your runtime around 200% >> to get that P95 and then allow a full period burst to get your P99, but >> that same RT theory would also have you calculate the resulting >> interference and see if that works with the rest of the system... >> > > I am sorry that I don't know much about the RT theory you mentioned, and can't provide > the desired calculation now. But I'd like to try and do some reading if that is needed. > >> 16 times is horrific. > > So can we decide on a more relative value now? Or is the interference probabilities still the > missing piece? A more [realistic] value, I mean. > > Is the paper you mentioned about called "Insensitivity results in statistical bandwidth sharing", > or some related ones on statistical bandwidth results under some kind of fairness?