Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp155959pxf; Wed, 17 Mar 2021 18:29:07 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzPoDKbMv6ekbN6e0Be5cRoSBsJKHTat/1Hb2p9zoO2hzzdVpu2TP5PfE1NXthFpN4mgzu5 X-Received: by 2002:a05:6402:6cb:: with SMTP id n11mr647882edy.198.1616030947487; Wed, 17 Mar 2021 18:29:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616030947; cv=none; d=google.com; s=arc-20160816; b=sa/W4qyU+fao5CK/qYfkJeBZgfOSZ4xgKNsuAsa31R8wCvE+ONHOZd+VbjOAJpUM/W zH5J1PdPgmtSkr1nise6ppXtBsfaaQzhPfbmrOYOIq3AFyr/6EGv4A6nc668/jwS9gk2 VY1UyV/M0N/UfNv842eatNsFZTOUMCKIITzA4ZlAIoCbKeUH0rQxGNywlyAS2VkkYbTy oT4YagS4WlfMRoWEUO3HdBKpIRc+bsvITAbnQ8cnrQvaswW1yzkudv0se7sUiAtaiNCw ktvFU2bnOB+FefcWJSkvVLY6FTg7M1FslEXpYLWXGsq86Jpz7is9p91FBL8R9pK410yG P0Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=OpML0lB1tlSXerlJTpZFBJ4Cr5a/zv/5tERV0uDrLkc=; b=kdaIa0WIEW6IjdqGh4+Smhd4FT9XtC2ckj85mbKVttRnstI0tGQRsMjmwpQaAxgQ7v i5Z5Sna80rB8/XZfHcF0orsWzfcHIDgOu/u0TpX7ByRR1wO90/ZBGTomQj+XNAPDJun/ k2npEuV1qjj07Lbv/UCvwnHBaxsGCa/OxrtMeEeOYHQjzS7iVBFz7mxfY4FUIi7gIGSZ GHIsxsxJz5/3E+CDLVXqV6xrRoHJsGZa44dQ6NuZW6BbGVQLkGJBgo9E50x9KNLb8Iyw chW88Wjmnolny8kf4/JO1ZiFpCyYGFdktLKiQV/84j8iljU8WMHLw3BK6hQqcptMG6Dj ZqRQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n16si385100edw.216.2021.03.17.18.28.44; Wed, 17 Mar 2021 18:29:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229866AbhCRBZe convert rfc822-to-8bit (ORCPT + 99 others); Wed, 17 Mar 2021 21:25:34 -0400 Received: from out30-54.freemail.mail.aliyun.com ([115.124.30.54]:59702 "EHLO out30-54.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229472AbhCRBZa (ORCPT ); Wed, 17 Mar 2021 21:25:30 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R201e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=changhuaixin@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0USLDttW_1616030695; Received: from 30.25.254.31(mailfrom:changhuaixin@linux.alibaba.com fp:SMTPD_---0USLDttW_1616030695) by smtp.aliyun-inc.com(127.0.0.1); Thu, 18 Mar 2021 09:25:27 +0800 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth burst From: changhuaixin In-Reply-To: Date: Thu, 18 Mar 2021 09:26:58 +0800 Cc: changhuaixin , Benjamin Segall , dietmar.eggemann@arm.com, juri.lelli@redhat.com, khlebnikov@yandex-team.ru, open list , mgorman@suse.de, mingo@redhat.com, Odin Ugedal , Odin Ugedal , pauld@redhead.com, Paul Turner , rostedt@goodmis.org, Shanpei Chen , Tejun Heo , Vincent Guittot , xiyou.wangcong@gmail.com Content-Transfer-Encoding: 8BIT Message-Id: References: <20210316044931.39733-1-changhuaixin@linux.alibaba.com> <20210316044931.39733-2-changhuaixin@linux.alibaba.com> To: Peter Zijlstra X-Mailer: Apple Mail (2.3445.104.11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Mar 17, 2021, at 4:06 PM, Peter Zijlstra wrote: > > On Wed, Mar 17, 2021 at 03:16:18PM +0800, changhuaixin wrote: > >>> Why do you allow such a large burst? I would expect something like: >>> >>> if (burst > quote) >>> return -EINVAL; >>> >>> That limits the variance in the system. Allowing super long bursts seems >>> to defeat the entire purpose of bandwidth control. >> >> I understand your concern. Surely large burst value might allow super >> long bursts thus preventing bandwidth control entirely for a long >> time. >> >> However, I am afraid it is hard to decide what the maximum burst >> should be from the bandwidth control mechanism itself. Allowing some >> burst to the maximum of quota is helpful, but not enough. There are >> cases where workloads are bursty that they need many times more than >> quota in a single period. In such cases, limiting burst to the maximum >> of quota fails to meet the needs. >> >> Thus, I wonder whether is it acceptable to leave the maximum burst to >> users. If the desired behavior is to allow some burst, configure burst >> accordingly. If that is causing variance, use share or other fairness >> mechanism. And if fairness mechanism still fails to coordinate, do not >> use burst maybe. > > It's not fairness, bandwidth control is about isolation, and burst > introduces interference. > >> In this way, cfs_b->buffer can be removed while cfs_b->max_overrun is >> still needed maybe. > > So what is the typical avg,stdev,max and mode for the workloads where you find > you need this? > > I would really like to put a limit on the burst. IMO a workload that has > a burst many times longer than the quota is plain broken. I see. Then the problem comes down to how large the limit on burst shall be. I have sampled the CPU usage of a bursty container in 100ms periods. The statistics are: average : 42.2% stddev : 81.5% max : 844.5% P95 : 183.3% P99 : 437.0% If quota is 100000ms, burst buffer needs to be 8 times more in order for this workload not to be throttled. I can't say this is typical, but these workloads exist. On a machine running Kubernetes containers, where there is often room for such burst and the interference is hard to notice, users would prefer allowing such burst to being throttled occasionally. In this sense, I suggest limit burst buffer to 16 times of quota or around. That should be enough for users to improve tail latency caused by throttling. And users might choose a smaller one or even none, if the interference is unacceptable. What do you think?