Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp707812pxb; Thu, 21 Oct 2021 08:02:28 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwKhk9K0f83BSuBr5dxAp/PZgzis35celVrQofNUJUY2DNIjqbl75P14pPTUMJM3bH8l9aq X-Received: by 2002:a17:906:3d62:: with SMTP id r2mr7853095ejf.174.1634828548168; Thu, 21 Oct 2021 08:02:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1634828548; cv=none; d=google.com; s=arc-20160816; b=j4Z9KGfQ/O28GNomuQTyP+Kf0g2h56VTUcPLCGGh7OkgCqQ1fAXaY4hLHBiAGle8Ql tTZS8oC8zMOIpoAHebBEmx33hmeq8SyHVbYiTEvYXbZXUrFEQTzmz9Fg9703reYDGzEF qMNR9pqxiKISl81dInug9PyUezlQbeM/UFxpyfdWlsmfeWqxSbqyQF8yvZ1KrnH26vff fRkb8d4ytnasoqi0/DbLJv3ZQlb9HAGCCPNOKuYdGAJptjzgvllJ5Z2yTOzFDjXJrJmP 3iHcsBkaltTG5XWy7sNBm+ucDzioqaUhLgxp/sjZ4MUyRVormKQjoc/OBwywVnd/fDQD 33rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=48Zpbo+AmJbe9ur+Zk4I7Uc/q4LlPMGuB13RP1aSb3M=; b=e1vXz8m8vNn82Bz7sFieSlyBVjeJC35pfuGWdL9nsfTHZ4jRs7p0IAAD0tLZN35TX0 i6siSqQJXF7UH90jjVq6Bjb3RJ9piznN/pRwyAIM9BvxPOsDgzNg08Iy3j0PAJGtoE1q 4Q8r/1nIGAK5iW7DMsX07HrhmwbfOLEMkf4tT7Kj2qSX7QPeYMF2aXnLCBYQ3/K+YVj4 R594nocwzl4DMj/1O3N3Rj/UnZdirXffqzLMPQSE8W1NmiATgb83aRHibcBGLlpT7lD3 MLcwphPW4oLCkbzDuTsVHbqiEDaFHJUiagDNuWNPut/2iDgK/d1VhxF6pgo1C2UWCAEP niWg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j1si6444463edf.29.2021.10.21.08.01.59; Thu, 21 Oct 2021 08:02:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231169AbhJUO6y (ORCPT + 99 others); Thu, 21 Oct 2021 10:58:54 -0400 Received: from outbound-smtp37.blacknight.com ([46.22.139.220]:43213 "EHLO outbound-smtp37.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231408AbhJUO6v (ORCPT ); Thu, 21 Oct 2021 10:58:51 -0400 Received: from mail.blacknight.com (pemlinmail05.blacknight.ie [81.17.254.26]) by outbound-smtp37.blacknight.com (Postfix) with ESMTPS id 856C51EE1 for ; Thu, 21 Oct 2021 15:56:34 +0100 (IST) Received: (qmail 10586 invoked from network); 21 Oct 2021 14:56:34 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.17.29]) by 81.17.254.9 with ESMTPA; 21 Oct 2021 14:56:34 -0000 From: Mel Gorman To: Peter Zijlstra Cc: Ingo Molnar , Vincent Guittot , Valentin Schneider , Aubrey Li , Barry Song , Mike Galbraith , Srikar Dronamraju , LKML , Mel Gorman Subject: [PATCH 2/2] sched/fair: Increase wakeup_gran if current task has not executed the minimum granularity Date: Thu, 21 Oct 2021 15:56:03 +0100 Message-Id: <20211021145603.5313-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20211021145603.5313-1-mgorman@techsingularity.net> References: <20211021145603.5313-1-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to debugfs") moved the kernel.sched_wakeup_granularity_ns sysctl under debugfs. One of the reasons why this sysctl may be used may be for "optimising for throughput", particularly when overloaded. The tool TuneD sometimes alters this for two profiles e.g. "mssql" and "throughput-performance". At least version 2.9 does but it changed in master where it also will poke at debugfs instead. This patch aims to reduce the motivation to tweak sysctl_sched_wakeup_granularity by increasing sched_wakeup_granularity if the running task runtime has not exceeded sysctl_sched_min_granularity. During task migration or wakeup, a decision is made on whether to preempt the current task or not. To limit over-scheduled, sysctl_sched_wakeup_granularity delays the preemption to allow at least 1ms of runtime before preempting. However, when a domain is heavily overloaded (e.g. hackbench), the degree of over-scheduling is still severe. This is problematic as time is wasted rescheduling tasks that could instead be used by userspace tasks. However, care must be taken. Even if a system is overloaded, there may be high priority threads that must still be able to run. Mike Galbraith explained the constraints as follows; CFS came about because the O1 scheduler was unfair to the point it had starvation problems. People pretty much across the board agreed that a fair scheduler was a much way better way to go, and CFS was born. It didn't originally have the sleep credit business, but had to grow it to become _short term_ fair. Ingo cut the sleep credit in half because of overscheduling, and that has worked out pretty well all told.. but now you're pushing it more in the unfair direction, all the way to extremely unfair for anything and everything very light. Fairness isn't the holy grail mind you, and at some point, giving up on short term fairness certainly isn't crazy, as proven by your hackbench numbers and other numbers we've seen over the years, but taking bites out of the 'CF' in the CFS that was born to be a corner-case killer is.. worrisome. The other shoe will drop.. it always does :) This patch increases the wakeup granularity if the current task has not reached its minimum preemption granularity. The current task may still be preempted but the difference in runtime must be higher. hackbench-process-pipes 5.15.0-rc3 5.15.0-rc3 sched-wakeeflips-v1r1sched-scalewakegran-v3r2 Amean 1 0.3890 ( 0.00%) 0.3823 ( 1.71%) Amean 4 0.5217 ( 0.00%) 0.4867 ( 6.71%) Amean 7 0.5387 ( 0.00%) 0.5053 ( 6.19%) Amean 12 0.5443 ( 0.00%) 0.5450 ( -0.12%) Amean 21 0.6487 ( 0.00%) 0.6807 ( -4.93%) Amean 30 0.8033 ( 0.00%) 0.7107 * 11.54%* Amean 48 1.2400 ( 0.00%) 1.0447 * 15.75%* Amean 79 1.8200 ( 0.00%) 1.6033 * 11.90%* Amean 110 2.5820 ( 0.00%) 2.0763 * 19.58%* Amean 141 3.2203 ( 0.00%) 2.5313 * 21.40%* Amean 172 3.8200 ( 0.00%) 3.1163 * 18.42%* Amean 203 4.3357 ( 0.00%) 3.5560 * 17.98%* Amean 234 4.8047 ( 0.00%) 3.8913 * 19.01%* Amean 265 5.1243 ( 0.00%) 4.2293 * 17.47%* Amean 296 5.5940 ( 0.00%) 4.5357 * 18.92%* 5.15.0-rc3 5.15.0-rc3 sched-wakeeflips-v1r1 sched-scalewakegran-v3r2 Duration User 2567.27 2034.17 Duration System 21098.79 17137.08 Duration Elapsed 136.49 120.2 Signed-off-by: Mel Gorman --- kernel/sched/fair.c | 17 +++++++++++++++-- kernel/sched/features.h | 2 ++ 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index d00af3b97d8f..dee108470297 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7052,10 +7052,23 @@ balance_fair(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) } #endif /* CONFIG_SMP */ -static unsigned long wakeup_gran(struct sched_entity *se) +static unsigned long +wakeup_gran(struct sched_entity *curr, struct sched_entity *se) { unsigned long gran = sysctl_sched_wakeup_granularity; + if (sched_feat(SCALE_WAKEUP_GRAN)) { + unsigned long delta_exec; + + /* + * Increase the wakeup granularity if curr's runtime + * is less than the minimum preemption granularity. + */ + delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime; + if (delta_exec < sysctl_sched_min_granularity) + gran += sysctl_sched_min_granularity; + } + /* * Since its curr running now, convert the gran from real-time * to virtual-time in his units. @@ -7094,7 +7107,7 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se) if (vdiff <= 0) return -1; - gran = wakeup_gran(se); + gran = wakeup_gran(curr, se); if (vdiff > gran) return 1; diff --git a/kernel/sched/features.h b/kernel/sched/features.h index 7f8dace0964c..611591355ffd 100644 --- a/kernel/sched/features.h +++ b/kernel/sched/features.h @@ -95,3 +95,5 @@ SCHED_FEAT(LATENCY_WARN, false) SCHED_FEAT(ALT_PERIOD, true) SCHED_FEAT(BASE_SLICE, true) + +SCHED_FEAT(SCALE_WAKEUP_GRAN, true) -- 2.31.1