Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3741250pxb; Tue, 26 Jan 2021 03:37:59 -0800 (PST) X-Google-Smtp-Source: ABdhPJw140+rcFNh2SBJII2e/5RTo4wQNaAKloUylQdoHJ/LIQtL4vWdFBySVDZ2XU2Nmr9EP6cB X-Received: by 2002:aa7:d504:: with SMTP id y4mr4165866edq.372.1611661079774; Tue, 26 Jan 2021 03:37:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611661079; cv=none; d=google.com; s=arc-20160816; b=yF/nmBShiC+RcoojMWqLMn+HuoIsAci0i0g1qLU5R1m1KEsmdRDmvkDUeUTEOowNUn df6xlbQlnCd4IDmORGTnAH8RLGqm9ansnsNxmej7Y1Nz+v7MAmCFO6rdcPvaL8GCjPhG YgsWIreD6aSEzLmbgwhPa4ZeT7dvCRC4fAxvoKQKGt6tBp3acPE9pyBFDpECH33iaFjt heuIO9r4qBDwIaE/ulAHm8cQbPd/4P5bmNJP34h4QVOB2gI7XJXp6DDpocG4qdaDjHBf E9NevMBsf85K2/LgcWAa/tClO0hXc/C++qA8qIZIbJVRpL3QRRBvyIxJR3CtKKw0VSk1 Q2RA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=66tvWcqa53Gg2JiLqTgpGFO3AVK0yO6ps+PncJTWS50=; b=tRd0TcU+AfBtFJ3cZGNRwy9V28pYTmazvGQ/AGMMvlqW1hEOkN7jEyYeBWo9UvLJoa HTu4I53v7q9P5JSswB5OJkT71KPyWUS/A0xRscqrh7XrcV7C9LClqu/6oU4bo6e0G/Uf ulIyCs4WztXa4pHhYfZ7XBsMvxHdQvE3ithLvw9w7fb7CkA7ifeU8ZqpYOOA/ugIVlTo bO2AQChSxwYoMIkZ5q6mI0XlItbsCIlnVy8u1HNvW8ZXQi1hsCEzxWXGjmNgj+Q2Ygdf JvJvaxf2npZMKFQ042kNJ9jU/QPjgrIvRCD3huG/38LC4GlhDYITUC6/g9CGzQ7jdG18 U4pA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j7si8339597edk.67.2021.01.26.03.37.35; Tue, 26 Jan 2021 03:37:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2392521AbhAZLf7 convert rfc822-to-8bit (ORCPT + 99 others); Tue, 26 Jan 2021 06:35:59 -0500 Received: from out4436.biz.mail.alibaba.com ([47.88.44.36]:29191 "EHLO out4436.biz.mail.alibaba.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2403860AbhAZKUX (ORCPT ); Tue, 26 Jan 2021 05:20:23 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R141e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=changhuaixin@linux.alibaba.com;NM=1;PH=DS;RN=15;SR=0;TI=SMTPD_---0UMybJJp_1611656340; Received: from 30.27.242.88(mailfrom:changhuaixin@linux.alibaba.com fp:SMTPD_---0UMybJJp_1611656340) by smtp.aliyun-inc.com(127.0.0.1); Tue, 26 Jan 2021 18:19:31 +0800 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: [PATCH v3 0/4] sched/fair: Burstable CFS bandwidth controller From: changhuaixin In-Reply-To: <20210121110453.18899-1-changhuaixin@linux.alibaba.com> Date: Tue, 26 Jan 2021 18:18:59 +0800 Cc: bsegall@google.com, dietmar.eggemann@arm.com, juri.lelli@redhat.com, khlebnikov@yandex-team.ru, linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com, pauld@redhead.com, peterz@infradead.org, pjt@google.com, rostedt@goodmis.org, shanpeic@linux.alibaba.com, vincent.guittot@linaro.org, xiyou.wangcong@gmail.com Content-Transfer-Encoding: 8BIT Message-Id: <9FD4A7E9-B545-40AB-A5B5-66DF37991474@linux.alibaba.com> References: <20201217074620.58338-1-changhuaixin@linux.alibaba.com> <20210121110453.18899-1-changhuaixin@linux.alibaba.com> To: changhuaixin X-Mailer: Apple Mail (2.3445.104.11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Jan 21, 2021, at 7:04 PM, Huaixin Chang wrote: > > Changelog > > v3: > 1. Fix another issue reported by test robot. > 2. Update docs as Randy Dunlap suggested. > > v2: > 1. Fix an issue reported by test robot. > 2. Rewriting docs. Appreciate any further suggestions or help. > > The CFS bandwidth controller limits CPU requests of a task group to > quota during each period. However, parallel workloads might be bursty > so that they get throttled. And they are latency sensitive at the same > time so that throttling them is undesired. > > Scaling up period and quota allows greater burst capacity. But it might > cause longer stuck till next refill. We introduce "burst" to allow > accumulating unused quota from previous periods, and to be assigned when > a task group requests more CPU than quota during a specific period. Thus > allowing CPU time requests as long as the average requested CPU time is > below quota on the long run. The maximum accumulation is capped by burst > and is set 0 by default, thus the traditional behaviour remains. > > A huge drop of 99th tail latency from more than 500ms to 27ms is seen for > real java workloads when using burst. Similar drops are seen when > testing with schbench too: > > echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs > echo 700000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us > echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us > echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us > > # The average CPU usage is around 500%, which is 200ms CPU time > # every 40ms. > ./schbench -m 1 -t 30 -r 60 -c 10000 -R 500 > > Without burst: > > Latency percentiles (usec) > 50.0000th: 7 > 75.0000th: 8 > 90.0000th: 9 > 95.0000th: 10 > *99.0000th: 933 > 99.5000th: 981 > 99.9000th: 3068 > min=0, max=20054 > rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33% > > With burst: > > Latency percentiles (usec) > 50.0000th: 7 > 75.0000th: 8 > 90.0000th: 9 > 95.0000th: 9 > *99.0000th: 12 > 99.5000th: 13 > 99.9000th: 19 > min=0, max=406 > rps: 498.36 p95 (usec) 9 p99 (usec) 12 p95/cputime 0.09% p99/cputime 0.12% > > How much workloads with benefit from burstable CFS bandwidth control > depends on how bursty and how latency sensitive they are. > > Previously, Cong Wang and Konstantin Khlebnikov proposed similar > feature: > https://lore.kernel.org/lkml/20180522062017.5193-1-xiyou.wangcong@gmail.com/ > https://lore.kernel.org/lkml/157476581065.5793.4518979877345136813.stgit@buzz/ > > This time we present more latency statistics and handle overflow while > accumulating. > > Huaixin Chang (4): > sched/fair: Introduce primitives for CFS bandwidth burst > sched/fair: Make CFS bandwidth controller burstable > sched/fair: Add cfs bandwidth burst statistics > sched/fair: Add document for burstable CFS bandwidth control > > Documentation/scheduler/sched-bwc.rst | 49 +++++++++++-- > include/linux/sched/sysctl.h | 2 + > kernel/sched/core.c | 126 +++++++++++++++++++++++++++++----- > kernel/sched/fair.c | 58 +++++++++++++--- > kernel/sched/sched.h | 9 ++- > kernel/sysctl.c | 18 +++++ > 6 files changed, 232 insertions(+), 30 deletions(-) > > -- > 2.14.4.44.g2045bb6 Ping, any new comments on this patchset? If there're no other concerns, I think it's ready to be merged?