Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp250515pxb; Fri, 29 Oct 2021 09:10:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwoFApQD2fJEPdptudQAVgVf+2NQQIzUvhv2Ijshhb4argRJXvXkhngtZcMUhnMDNfk9j4i X-Received: by 2002:a05:6e02:ea5:: with SMTP id u5mr7898941ilj.285.1635523853156; Fri, 29 Oct 2021 09:10:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635523853; cv=none; d=google.com; s=arc-20160816; b=TYGBZanDSdg5z+SOUh5uBspbTe9tm239cj6Fde95zx/Vr+zjBWa4lABhinTLPOgePD xA0eg6FNS6Q7sBCLj760pBVoZeTZnq9Nj5Q3gdB3+j52eVtkkSiCLoz8IsYEEJSuBxQE 17iRw1bfCXavgJzWEWdMSStb/5W6ftUsi3l3mefYOYS0vimaFglBeNi2eb1wWwLBwJ54 NVsuPwZ/UYhyQcZAbXFmjdy2f6mE3ciuR7Owh5p7lxv6dW3JOx49thX5pqoBD82ua7yn t0hUYIjbrg0qF8LsNh2cC1dvp1zvmL8FLckdvffJTbJ+MxTqtvVPogYUJ9lYFoRc1ERs Msdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=YWzMrxI3A0GqZWQ4duWHF4d7CuKb2hbjO4jWM4LNe/g=; b=THVlh5y11JZJRyTxEdDeJ4JI4QJEPLrkNmAM8/WU+kYYW2cit7tt/Hdab9N+e3Bb4H v620c425y39sEnFDYKs2Wr0ZdVxVXUDZXuBbvOqim0MFTu52FyDB2971aZG7zBiCH3CQ Gc1WkOXLKw7uTAadsi2bPQEuHphKX5IlXo5zN3O9gCmVbLCw6YIBqR28rYZ8e6lz4KoP l4yj0FXBRIbejAoKJY38fH+q4T8vJLmoUVDGvtSQv7uF5DASCkWgiWzsALQU/mcxtpod dJJIenuRGxky7JTbR/WkHcaz8uMNEFj7FE6rtGt8/S1hJvZXOmepMT2FWDoSjtgHaIvT OC5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YhCH7ElF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id 8si371345ilx.88.2021.10.29.09.10.23; Fri, 29 Oct 2021 09:10:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=YhCH7ElF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229979AbhJ2QKM (ORCPT + 99 others); Fri, 29 Oct 2021 12:10:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39880 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229971AbhJ2QKL (ORCPT ); Fri, 29 Oct 2021 12:10:11 -0400 Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EB2C9C061714 for ; Fri, 29 Oct 2021 09:07:42 -0700 (PDT) Received: by mail-yb1-xb30.google.com with SMTP id v138so19922619ybb.8 for ; Fri, 29 Oct 2021 09:07:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YWzMrxI3A0GqZWQ4duWHF4d7CuKb2hbjO4jWM4LNe/g=; b=YhCH7ElFjNrnUa01kXAkeFHVDC5WKKIjd6WJzbJMVg3djovUJnsqqMyROQvrZc7tHn OfquniUmGcmkDmcYDWj1fEdswb7hZs9rmmQ1BEQ0tSLvgzq4dR8ugiC5Gi9c/9XngzJ8 MuzZoYgS9aC5D4r4GG2LNFS04knj/h1t/qfeAHlzuWB8ltrJSHqtZ7ExkpEUIyo9FVbZ RW8uVQ8ty/eTvfTFxUfMgjjBIP2yzelqGcMPXaScCUNQS1Nb6/05nD4STVTd1Cy4AkIC K0/QTxU2GJyrmhp4KhcyrmyTAG9fHVpP/8OmmbiU8u9tqfbB+MVw5arFY9uPo+jSJVgZ jYuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YWzMrxI3A0GqZWQ4duWHF4d7CuKb2hbjO4jWM4LNe/g=; b=iR3TQOJiKOB1KCu6++kcsVA+i3m2NGsUPsZO+vrJ5zAsX+ouXNB5aHTEBfG0zrc+Zc kwYThrPMsgzCgWZNrKaoYBF6LIJ3OlrlvfoQrqcSGJ/byFbddCGVg83cPFo0aFRSvIvU HGm4yzuDrR48Am82fr+IFj8Kd9VrLbdU5Wkeah7PYaejMT3TZeijzzGUiWK1yTLikNmm FLAJSf7yNrCXhuF07Qn9WquOcf+ClZo2FaBVF0/qZkvvU8uf+C0puU+CJXOK7WBfY0AQ krQTHhUzRyqb5VqAwfKeDeYaib5a6pJ+2KTG61ull/StkS77RVAvp3kBRXNbHGzleTTk k0ag== X-Gm-Message-State: AOAM530/HNX8FUwvVs83sz0VUX/H07p+wL/LrY/sbdVeGZprd32RPhNB e/dZUPpNroV3lRC885ajSh/QcLLbhbWg7kZlM6dhqf/Z1JA= X-Received: by 2002:a05:6902:1148:: with SMTP id p8mr14430735ybu.402.1635523662081; Fri, 29 Oct 2021 09:07:42 -0700 (PDT) MIME-Version: 1.0 References: <20211028094834.1312-1-mgorman@techsingularity.net> <20211028094834.1312-3-mgorman@techsingularity.net> In-Reply-To: <20211028094834.1312-3-mgorman@techsingularity.net> From: Vincent Guittot Date: Fri, 29 Oct 2021 18:07:30 +0200 Message-ID: Subject: Re: [PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity To: Mel Gorman Cc: Peter Zijlstra , Ingo Molnar , Valentin Schneider , Aubrey Li , Barry Song , Mike Galbraith , Srikar Dronamraju , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 28 Oct 2021 at 11:49, Mel Gorman wrote: > > Commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs") > moved the kernel.sched_wakeup_granularity_ns sysctl under debugfs. > One of the reasons why this sysctl may be used may be for "optimising > for throughput", particularly when overloaded. The tool TuneD sometimes > alters this for two profiles e.g. "mssql" and "throughput-performance". At > least version 2.9 does but it changed in master where it also will poke > at debugfs instead. This patch aims to reduce the motivation to tweak > sysctl_sched_wakeup_granularity by increasing sched_wakeup_granularity > if the running task runtime has not exceeded sysctl_sched_min_granularity. > > During task migration or wakeup, a decision is made on whether > to preempt the current task or not. To limit over-scheduled, > sysctl_sched_wakeup_granularity delays the preemption to allow at least 1ms > of runtime before preempting. However, when a domain is heavily overloaded > (e.g. hackbench), the degree of over-scheduling is still severe. This is > problematic as time is wasted rescheduling tasks that could instead be > used by userspace tasks. > > However, care must be taken. Even if a system is overloaded, there may > be high priority threads that must still be able to run. Mike Galbraith > explained the constraints as follows; > > CFS came about because the O1 scheduler was unfair to the > point it had starvation problems. People pretty much across the > board agreed that a fair scheduler was a much way better way > to go, and CFS was born. It didn't originally have the sleep > credit business, but had to grow it to become _short term_ fair. > Ingo cut the sleep credit in half because of overscheduling, and > that has worked out pretty well all told.. but now you're pushing > it more in the unfair direction, all the way to extremely unfair > for anything and everything very light. > > Fairness isn't the holy grail mind you, and at some point, giving > up on short term fairness certainly isn't crazy, as proven by your > hackbench numbers and other numbers we've seen over the years, > but taking bites out of the 'CF' in the CFS that was born to be a > corner-case killer is.. worrisome. The other shoe will drop.. it > always does :) > > This patch increases the wakeup granularity if the current task has not > reached its minimum preemption granularity. The current task may still > be preempted but the difference in runtime must be higher. > > hackbench-process-pipes > 5.15.0-rc3 5.15.0-rc3 > sched-wakeeflips-v1r1sched-scalewakegran-v3r2 > Amean 1 0.3890 ( 0.00%) 0.3823 ( 1.71%) > Amean 4 0.5217 ( 0.00%) 0.4867 ( 6.71%) > Amean 7 0.5387 ( 0.00%) 0.5053 ( 6.19%) > Amean 12 0.5443 ( 0.00%) 0.5450 ( -0.12%) > Amean 21 0.6487 ( 0.00%) 0.6807 ( -4.93%) > Amean 30 0.8033 ( 0.00%) 0.7107 * 11.54%* > Amean 48 1.2400 ( 0.00%) 1.0447 * 15.75%* > Amean 79 1.8200 ( 0.00%) 1.6033 * 11.90%* > Amean 110 2.5820 ( 0.00%) 2.0763 * 19.58%* > Amean 141 3.2203 ( 0.00%) 2.5313 * 21.40%* > Amean 172 3.8200 ( 0.00%) 3.1163 * 18.42%* > Amean 203 4.3357 ( 0.00%) 3.5560 * 17.98%* > Amean 234 4.8047 ( 0.00%) 3.8913 * 19.01%* > Amean 265 5.1243 ( 0.00%) 4.2293 * 17.47%* > Amean 296 5.5940 ( 0.00%) 4.5357 * 18.92%* > > 5.15.0-rc3 5.15.0-rc3 > sched-wakeeflips-v1r1 sched-scalewakegran-v3r2 > Duration User 2567.27 2034.17 > Duration System 21098.79 17137.08 > Duration Elapsed 136.49 120.2 > > Signed-off-by: Mel Gorman > --- > kernel/sched/fair.c | 17 +++++++++++++++-- > kernel/sched/features.h | 2 ++ > 2 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index d00af3b97d8f..dee108470297 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7052,10 +7052,23 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) > } > #endif /* CONFIG_SMP */ > > -static unsigned long wakeup_gran(struct sched_entity *se) > +static unsigned long > +wakeup_gran(struct sched_entity *curr, struct sched_entity *se) > { > unsigned long gran = sysctl_sched_wakeup_granularity; > > + if (sched_feat(SCALE_WAKEUP_GRAN)) { > + unsigned long delta_exec; > + > + /* > + * Increase the wakeup granularity if curr's runtime > + * is less than the minimum preemption granularity. > + */ > + delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; > + if (delta_exec < sysctl_sched_min_granularity) > + gran += sysctl_sched_min_granularity; I need to think a bit more about corner cases but this change looks much better than the previous one. > + } > + > /* > * Since its curr running now, convert the gran from real-time > * to virtual-time in his units. > @@ -7094,7 +7107,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) > if (vdiff <= 0) > return -1; > > - gran = wakeup_gran(se); > + gran = wakeup_gran(curr, se); > if (vdiff > gran) > return 1; > > diff --git a/kernel/sched/features.h b/kernel/sched/features.h > index 7f8dace0964c..611591355ffd 100644 > --- a/kernel/sched/features.h > +++ b/kernel/sched/features.h > @@ -95,3 +95,5 @@ SCHED_FEAT(LATENCY_WARN, false) > > SCHED_FEAT(ALT_PERIOD, true) > SCHED_FEAT(BASE_SLICE, true) > + > +SCHED_FEAT(SCALE_WAKEUP_GRAN, true) > -- > 2.31.1 >