Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp2285492rwr; Fri, 28 Apr 2023 08:23:59 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5l+GI8OlQEFd3wnhQXj5O5XqUS/3W257YLFBU1SVHFOXJ/8mtGBwW03oQuepHiDVhVzpNW X-Received: by 2002:a05:6a00:984:b0:63d:2f13:1f3 with SMTP id u4-20020a056a00098400b0063d2f1301f3mr8716379pfg.33.1682695439655; Fri, 28 Apr 2023 08:23:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682695439; cv=none; d=google.com; s=arc-20160816; b=KijSN4pTJjJ3iBO8JoIU4BfSK3s12nSEcMiSHmGcZHDoPTESCvCtYoGfl2H5daI6w2 +1yc1ldrykYTCTROW4nXnrXIfxasOThas5HNo7xB6ZkkZXtOZ4A31kIplQDoFCGDPNJQ h5LaLfgeWcTifrt1u9X6dS7CoVzqG6TopFReAOYa06Szfpay/0HPzTaCZkb2RwhezTa6 /ikpzg+E60vghlcD5ha75MpGjmdQRcqnfwYdv3X8ahftoZocHSMHI8bmGeiMBGmNV76q SDrRk+Gk6hoSCJ0y+4WTs8qM/wu8hX+M31UZOVoVOb+oj++H2mFclLhlwyJWUozTFYH/ VOQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=i03twO26INbr5Lrexol1uBDYgNsBb6oKsd9A/XgFhlg=; b=jnFSgrt3Q93ZxUyupduVcI7kFXFMTMumKrgO+kSs6xBdfqANPiFMP2JcsdM1qKdQHb 2Sed+TBr7y8v4BkLbKltRciNZAsDDVcgCuXYvzQcfkhYt4OejrW9kh0Vo8f8IrXVt1BN 9CiFIOjD6o3uoV6BEBXnEbq6D2fwn5ji/2GVdy/9R90E1IIW9FBX3xuCV+1K0gM7QCL9 i580pfFHGUFz6yqwt00wFNkBKgx0/hmBJdb2t5PIe6oR7Oi8VX3oUrtxht0i/T3UkrEx Hl5JztFnot0vK74ziWg4f2s+QFfQuI4k/VvsK9BVLEY6Tm5te00vTf5hD8kHRkN2BlZU ox/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Z/paJMiM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f203-20020a6238d4000000b0063af7d11986si21219861pfa.350.2023.04.28.08.23.35; Fri, 28 Apr 2023 08:23:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Z/paJMiM"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1346173AbjD1PVC (ORCPT + 99 others); Fri, 28 Apr 2023 11:21:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36350 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346217AbjD1PVB (ORCPT ); Fri, 28 Apr 2023 11:21:01 -0400 Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C19064EF2 for ; Fri, 28 Apr 2023 08:20:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1682695259; x=1714231259; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=uAP4LlcLCbheWZnyXGmFMWlML/Dgi3y7OP7EmkzhsIs=; b=Z/paJMiMc0qV6aaVad2pYLZq42+EPfwpFqGGza2xlQXv6GH90QcffQrF jIzYNXQyhn78xXVvBvKs64GqCSPP3+Z/LyorTKoyrmCYDg1XDLq40ep4Q JIfp9L84XXw1C3xW9UjSzJjEHHfzxH2CemBHSM0e+f3c/RswxU/DqPtJe x0/kJS8oyb4s6AzXWIFmFGeltrRTv70OWbFrAycOABt5AHwMGLi+pkNOW nk9JUXFHEGBAYfKruz8jxkW+P/TOAha84RInQhj3TNGdmaMctD8VQStV8 dJHIvZ0EjrLBkz6faMhxmkBOJEq/xEhA0McnTt7VFQEl3qT6iRB5acvGo Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10694"; a="347799327" X-IronPort-AV: E=Sophos;i="5.99,234,1677571200"; d="scan'208";a="347799327" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Apr 2023 08:20:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10694"; a="727577121" X-IronPort-AV: E=Sophos;i="5.99,234,1677571200"; d="scan'208";a="727577121" Received: from chenyu-dev.sh.intel.com ([10.239.62.164]) by orsmga001.jf.intel.com with ESMTP; 28 Apr 2023 08:20:51 -0700 From: Chen Yu To: Peter Zijlstra , Vincent Guittot , Ingo Molnar , Juri Lelli Cc: Mel Gorman , Tim Chen , Dietmar Eggemann , Steven Rostedt , Ben Segall , K Prateek Nayak , Abel Wu , Yicong Yang , "Gautham R . Shenoy" , Honglei Wang , Len Brown , Chen Yu , Tianchen Ding , Joel Fernandes , Josh Don , kernel test robot , Arjan Van De Ven , Aaron Lu , linux-kernel@vger.kernel.org, Chen Yu Subject: [PATCH v8 1/2] sched/fair: Record the average duration of a task Date: Sat, 29 Apr 2023 07:16:41 +0800 Message-Id: X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DATE_IN_FUTURE_06_12, DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Record the average duration of a task, as there is a requirement to leverage this information for better task placement. At first thought the (p->se.sum_exec_runtime / p->nvcsw) can be used to measure the task duration. However, the history long past was factored too heavily in such a formula. Ideally, the old activity should decay and not affect the current status too much. Although something based on PELT can be used, se.util_avg might not be appropriate to describe the task duration: Task p1 and task p2 are doing frequent ping-pong scheduling on one CPU, both p1 and p2 have a short duration, but the util_avg of each task can be up to 50%, which is inconsistent with the short task duration. It was found that there was once a similar feature to track the duration of a task: commit ad4b78bbcbab ("sched: Add new wakeup preemption mode: WAKEUP_RUNNING") Unfortunately, it was reverted because it was an experiment. Pick the patch up again, by recording the average duration when a task voluntarily switches out. Suppose on CPU1, task p1 and p2 run alternatively: --------------------> time | p1 runs 1ms | p2 preempt p1 | p1 switch in, runs 0.5ms and blocks | ^ ^ ^ |_____________| |_____________________________________| ^ | p1 dequeued p1's duration in one section is (1 + 0.5)ms. Because if p2 does not preempt p1, p1 can run 1.5ms. This reflects the nature of a task: how long it wishes to run at most. Suggested-by: Tim Chen Suggested-by: Vincent Guittot Tested-by: K Prateek Nayak Signed-off-by: Chen Yu --- include/linux/sched.h | 3 +++ kernel/sched/core.c | 2 ++ kernel/sched/debug.c | 1 + kernel/sched/fair.c | 13 +++++++++++++ 4 files changed, 19 insertions(+) diff --git a/include/linux/sched.h b/include/linux/sched.h index 675298d6eb36..6ee6b00faa12 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -558,6 +558,9 @@ struct sched_entity { u64 prev_sum_exec_runtime; u64 nr_migrations; + u64 prev_sleep_sum_runtime; + /* average duration of a task */ + u64 dur_avg; #ifdef CONFIG_FAIR_GROUP_SCHED int depth; diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 898fa3bc2765..32eacd220e39 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4452,6 +4452,8 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p) p->se.prev_sum_exec_runtime = 0; p->se.nr_migrations = 0; p->se.vruntime = 0; + p->se.dur_avg = 0; + p->se.prev_sleep_sum_runtime = 0; INIT_LIST_HEAD(&p->se.group_node); #ifdef CONFIG_FAIR_GROUP_SCHED diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c index 1637b65ba07a..8d64fba16cfe 100644 --- a/kernel/sched/debug.c +++ b/kernel/sched/debug.c @@ -1024,6 +1024,7 @@ void proc_sched_show_task(struct task_struct *p, struct pid_namespace *ns, __PS("nr_involuntary_switches", p->nivcsw); P(se.load.weight); + P(se.dur_avg); #ifdef CONFIG_SMP P(se.avg.load_sum); P(se.avg.runnable_sum); diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3f8135d7c89d..3236011658a2 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6319,6 +6319,18 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags) static void set_next_buddy(struct sched_entity *se); +static inline void dur_avg_update(struct task_struct *p, bool task_sleep) +{ + u64 dur; + + if (!task_sleep) + return; + + dur = p->se.sum_exec_runtime - p->se.prev_sleep_sum_runtime; + p->se.prev_sleep_sum_runtime = p->se.sum_exec_runtime; + update_avg(&p->se.dur_avg, dur); +} + /* * The dequeue_task method is called before nr_running is * decreased. We remove the task from the rbtree and @@ -6391,6 +6403,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags) dequeue_throttle: util_est_update(&rq->cfs, p, task_sleep); + dur_avg_update(p, task_sleep); hrtick_update(rq); } -- 2.25.1