Received: by 2002:a05:6a10:206:0:0:0:0 with SMTP id 6csp981934pxj; Fri, 21 May 2021 03:49:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyE/Ann02jErNDXAU/UFDk8eZvLFTETdHaW3zGSD2SxCSRKuKiulqLC4mmyNSdh5sME38Xi X-Received: by 2002:a17:906:f885:: with SMTP id lg5mr9543612ejb.313.1621594162254; Fri, 21 May 2021 03:49:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1621594162; cv=none; d=google.com; s=arc-20160816; b=MGD13c1qSEcxlPOYMeSyELxOlwH4I8zQVKzTBydcnXvQw7TX+vDugJh23OuUX1Pdjw lV/6tbPpNLb6sfuhPWTgmMc4nJ5mlK2Msz2+OQAe6pvL1DNftTyElyXdKlaGPXgTd4Ar +l5DQ+KStcdsQ/gf1CRoGNRP9VNgBfGnbLN5wzHrnhyn+DQz/GEXtqiDIgxK17CqHyjE D2j6EvBNMhyR0yqEZVzrpOXesyj7TcDDZfyzcLH9qxxQnPkxVzHCQcbGinMr2JOiufUQ KWbetCv5Oi6LkYub9rfTuRDmjf6kueTM0xcP16irhElzAZROlon3eYbU4WEyTqjobb+K OGXQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version; bh=muufUSv9g7NT4BihHHWf5ic+ZZAEkibWvWfwc3R38qg=; b=bTDAxH57s5Ph2OVN3tyIRUaB0mMXOyvRvffULinXbMjngfAq1xTl4R3zZZmRLVyxwB zUZIvvxehE2TnRJQXC18bOOZEi6d7f4ePbLILdYPyN1p/Mp3hy9Zk4HCx/5kGNMXVkC2 hAKlGtVTRe89Gn/P+zM5o1ixb4lKxILRNYbsN/zx+Q/o/LqDvuhJpfV4ycJwFh8RgJtZ UM1X/OUP/hkmlv+CGcHKA6oL7fjLLYjSEzVnsgeJ5P97PAlRdVLGhY1f5A3q91+lv63e O8JfI5hic2W+NRia3FAl4XJF6Dj2KZBxxdlfuiovGdbNZWHdz+zM4OnjgJctU/JEY5Gi K18w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rs6si5264254ejb.266.2021.05.21.03.48.58; Fri, 21 May 2021 03:49:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235541AbhEUJJp convert rfc822-to-8bit (ORCPT + 99 others); Fri, 21 May 2021 05:09:45 -0400 Received: from out4436.biz.mail.alibaba.com ([47.88.44.36]:15269 "EHLO out4436.biz.mail.alibaba.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235439AbhEUJJn (ORCPT ); Fri, 21 May 2021 05:09:43 -0400 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R171e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=alimailimapcm10staff010182156082;MF=changhuaixin@linux.alibaba.com;NM=1;PH=DS;RN=18;SR=0;TI=SMTPD_---0UZb8F1R_1621588096; Received: from 30.240.99.2(mailfrom:changhuaixin@linux.alibaba.com fp:SMTPD_---0UZb8F1R_1621588096) by smtp.aliyun-inc.com(127.0.0.1); Fri, 21 May 2021 17:08:17 +0800 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: [PATCH v5 1/3] sched/fair: Introduce the burstable CFS controller From: changhuaixin In-Reply-To: Date: Fri, 21 May 2021 17:09:55 +0800 Cc: changhuaixin , Benjamin Segall , Dietmar Eggemann , dtcccc@linux.alibaba.com, Juri Lelli , khlebnikov@yandex-team.ru, open list , Mel Gorman , Ingo Molnar , pauld@redhead.com, Peter Zijlstra , Paul Turner , Steven Rostedt , Shanpei Chen , Tejun Heo , Vincent Guittot , xiyou.wangcong@gmail.com Content-Transfer-Encoding: 8BIT Message-Id: <447D741B-F430-4502-BCA6-C2A12118A2D2@linux.alibaba.com> References: <20210520123419.8039-1-changhuaixin@linux.alibaba.com> <20210520123419.8039-2-changhuaixin@linux.alibaba.com> To: Odin Ugedal X-Mailer: Apple Mail (2.3445.104.11) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On May 20, 2021, at 10:00 PM, Odin Ugedal wrote: > > Hi, > > Here are some more thoughts and questions: > >> The benefit of burst is seen when testing with schbench: >> >> echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs >> echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us >> echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us >> echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us >> >> # The average CPU usage is around 500%, which is 200ms CPU time >> # every 40ms. >> ./schbench -m 1 -t 30 -r 10 -c 10000 -R 500 >> >> Without burst: >> >> Latency percentiles (usec) >> 50.0000th: 7 >> 75.0000th: 8 >> 90.0000th: 9 >> 95.0000th: 10 >> *99.0000th: 933 >> 99.5000th: 981 >> 99.9000th: 3068 >> min=0, max=20054 >> rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33% > > It should be noted that this was running on a 64 core machine (if that was > the case, ref. your previous patch). > > I am curious how much you have tried tweaking both the period and the quota > for this workload. I assume a longer period can help such bursty application, > and from the small slowdowns, a slightly higher quota could also help > I guess. I am > not saying this is a bad idea, but that we need to understand what it > fixes, and how, > in order to be able to understand how/if to use it. > Yeah, it is a well tuned workload and configuration. I did this because for benchmarks like schbench, workloads are generated in a fixed pattern without burst. So I set schbench params carefully to generate burst during each 100ms periods, to show burst works. Longer period or higher quota helps indeed, in which case more workloads can be used to generate tail latency then. In my view, burst is like the cfsb way of token bucket. For the present cfsb, bucket capacity is strictly limited to quota. And that is changed into quota + burst now. And it shall be used when tasks get throttled and CPU is under utilized for the whole system. > Also, what value of the sysctl kernel.sched_cfs_bandwidth_slice_us are > you using? > What CONFIG_HZ you are using is also interesting, due to how bw is > accounted for. > There is some more info about it here: Documentation/scheduler/sched-bwc.rst. I > assume a smaller slice value may also help, and it would be interesting to see > what implications it gives. A high threads to (quota/period) ratio, together > with a high bandwidth_slice will probably cause some throttling, so one has > to choose between precision and overhead. > Default value of kernel.sched_cfs_bandwidth_slice_us(5ms) and CONFIG_HZ(1000) is used. The following case might be used to prevent getting throttled from many threads and high bandwidth slice: mkdir /sys/fs/cgroup/cpu/test echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us ./schbench -m 1 -t 3 -r 20 -c 80000 -R 20 On my machine, two workers work for 80ms and sleep for 120ms in each round. The average utilization is around 80%. This will work on a two-core system. It is recommended to try it multiple times as getting throttled doesn't necessarily cause tail latency for schbench. > Also, here you give a burst of 66% the quota. Would that be a typical value > for a cgroup, or is it just a result of testing? As I understand this Yeah, it is not a typical value, and tuned for this test. > patchset, your example > would allow 600% constant CPU load, then one period with 1000% load, > then another > "long set" of periods with 600% load. Have you discussed a way of limiting how > long burst can be "saved" before expiring? Haven't thought about it much. It is interesting but I doubt the need to do that. > >> @@ -9427,7 +9478,8 @@ static int cpu_max_show(struct seq_file *sf, void *v) >> { >> struct task_group *tg = css_tg(seq_css(sf)); >> >> - cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg)); >> + cpu_period_quota_print(sf, tg_get_cfs_period(tg), tg_get_cfs_quota(tg), >> + tg_get_cfs_burst(tg)); >> return 0; >> } > > The current cgroup v2 docs say the following: > >> cpu.max >> A read-write two value file which exists on non-root cgroups. >> The default is "max 100000". > > This will become a "three value file", and I know a few user space projects > who parse this file by splitting on the middle space. I am not sure if they are > "wrong", but I don't think we usually break such things. Not sure what > Tejun thinks about this. > Thanks, it will be modified in the way Tejun suggests. > Thanks > Odin