Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp404046pxb; Sat, 6 Mar 2021 03:46:17 -0800 (PST) X-Google-Smtp-Source: ABdhPJysFDqLOaKammeiCXNrV/CnlFPds6J2L8s3Bia6kFPa4FashUhkcX/pgJCVvTSoKFCGzUkS X-Received: by 2002:a17:907:76b6:: with SMTP id jw22mr6544992ejc.11.1615031177403; Sat, 06 Mar 2021 03:46:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1615031177; cv=none; d=google.com; s=arc-20160816; b=o0vpMUcpyeJnmKo7nNTuTVH69dGLz5u6TALZ3YvjgZuaIc2YAoTACWgXdX9Jv4FY+F 4YaFe0mTSviFn4o0sxIJ6kMl3ixNfNchGL0xIpkLw2xxSHZORqVbD3Jpbvfj85GPMuHJ riHVeDjAMphN44IRsVQEHLDt211CQPsw4K+XIaTUPILnjR7TdkL8pJnuAklWubOraHt7 gYNGVvmAeAp0CCtTgsxJOJkefycJdX6pT8NlJ7ZAdsxoeEor/yJAH+eau//Z2bCNpcoB xyL1IH7j5dT+5D/hZuXQp4bZaNqu8U+69Dcpn6vhna6tJ51lbPAvezYpP8uR3i8veQQ9 8hJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=rPTK/fwGWUMHDhf6gqgdW8iyKTtHfc3Eh6wNSTOEYi0=; b=Pso8mKaIF+Z1QVQtqj987B9+dIlMjWN+DZ0oUzII6dQ81yY3SbZLMkk8lpz2U9UNrD eMH3ZlP3OZzjxbFswOvk3GAW4zJhMFdlO7lO2JM//+sbFgW6TIj0f9VJGv905WE81NQ5 DGLYxiaEOHRHbcZWiY65Z9Z5vjOm7Bt40VOVaZ/EhohFOC+clAPwXBRarh9oWywmtLJJ U56T03Mof5I8mU42fFWuo22X/CxNiD9+fnaTLexoiuQK4UlUNg4lAXfpFCwTLcNVhD1T 7jWW4VSRcLJpSo1bfwPSSi1VGKuzxroBgUyDQlyidijYk2jLwewwfanyffVOgTwNLYwj PzQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=SC7nYem4; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id mb25si3159425ejb.519.2021.03.06.03.45.55; Sat, 06 Mar 2021 03:46:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=SC7nYem4; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231238AbhCFLml (ORCPT + 99 others); Sat, 6 Mar 2021 06:42:41 -0500 Received: from Galois.linutronix.de ([193.142.43.55]:34220 "EHLO galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230299AbhCFLmV (ORCPT ); Sat, 6 Mar 2021 06:42:21 -0500 Date: Sat, 06 Mar 2021 11:42:18 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1615030939; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rPTK/fwGWUMHDhf6gqgdW8iyKTtHfc3Eh6wNSTOEYi0=; b=SC7nYem4Qil51WjmB2/vtz+NXwLwdhsXXWio84Hfg8qClbkQ5gurASsRGpo8Jdc268DY/+ rqdljd6sxgGfINbW0Lchawy4Os9YmvGnR7sxxK3uVMBB0gUXZZ6ymOwqsWV1IO/4tu+wkF hRJ8q2CWAklKEJuNmmZLctIWWBNtv7hoxw/JC9201ir3HJqDRqtkppQpbVqcECa9bFf7G8 T5j1RI+MM1XvLOFL1l5DzKk2YAsZs5LMWD+TXVrsxkhcy+8zyOKzOoxv/Yd0Ci2H0km6Sz L9yxp+Yo8SfWtpSpenh0Y+bLWnQvcec9VlOrDw4KgTJAsZvJtqi5m8FSQDiYyg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1615030939; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=rPTK/fwGWUMHDhf6gqgdW8iyKTtHfc3Eh6wNSTOEYi0=; b=3Y8SEx5LPloNLBgSa5/PHnXKMr490hPi8XzuHTJF5YckAInbL67QMSqPozwMl77gnjqUXL K4Vta+c7hN4VLPCg== From: "tip-bot2 for Chengming Zhou" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: sched/core] psi: Use ONCPU state tracking machinery to detect reclaim Cc: Muchun Song , Chengming Zhou , "Peter Zijlstra (Intel)" , Ingo Molnar , Johannes Weiner , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20210303034659.91735-3-zhouchengming@bytedance.com> References: <20210303034659.91735-3-zhouchengming@bytedance.com> MIME-Version: 1.0 Message-ID: <161503093880.398.8358468486880713332.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The following commit has been merged into the sched/core branch of tip: Commit-ID: 7fae6c8171d20ac55402930ee8ae760cf85dff7b Gitweb: https://git.kernel.org/tip/7fae6c8171d20ac55402930ee8ae760cf85dff7b Author: Chengming Zhou AuthorDate: Wed, 03 Mar 2021 11:46:57 +08:00 Committer: Ingo Molnar CommitterDate: Sat, 06 Mar 2021 12:40:22 +01:00 psi: Use ONCPU state tracking machinery to detect reclaim Move the reclaim detection from the timer tick to the task state tracking machinery using the recently added ONCPU state. And we also add task psi_flags changes checking in the psi_task_switch() optimization to update the parents properly. In terms of performance and cost, this ONCPU task state tracking is not cheaper than previous timer tick in aggregate. But the code is simpler and shorter this way, so it's a maintainability win. And Johannes did some testing with perf bench, the performace and cost changes would be acceptable for real workloads. Thanks to Johannes Weiner for pointing out the psi_task_switch() optimization things and the clearer changelog. Co-developed-by: Muchun Song Signed-off-by: Muchun Song Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Acked-by: Johannes Weiner Link: https://lkml.kernel.org/r/20210303034659.91735-3-zhouchengming@bytedance.com --- include/linux/psi.h | 1 +- kernel/sched/core.c | 1 +- kernel/sched/psi.c | 65 +++++++++++++++---------------------------- kernel/sched/stats.h | 9 +------ 4 files changed, 24 insertions(+), 52 deletions(-) diff --git a/include/linux/psi.h b/include/linux/psi.h index 7361023..65eb147 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -20,7 +20,6 @@ void psi_task_change(struct task_struct *task, int clear, int set); void psi_task_switch(struct task_struct *prev, struct task_struct *next, bool sleep); -void psi_memstall_tick(struct task_struct *task, int cpu); void psi_memstall_enter(unsigned long *flags); void psi_memstall_leave(unsigned long *flags); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 361974e..d2629fd 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -4551,7 +4551,6 @@ void scheduler_tick(void) update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure); curr->sched_class->task_tick(rq, curr, 0); calc_global_load_tick(rq); - psi_task_tick(rq); rq_unlock(rq, &rf); diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 2293c45..0fe6ff6 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -644,8 +644,7 @@ static void poll_timer_fn(struct timer_list *t) wake_up_interruptible(&group->poll_wait); } -static void record_times(struct psi_group_cpu *groupc, int cpu, - bool memstall_tick) +static void record_times(struct psi_group_cpu *groupc, int cpu) { u32 delta; u64 now; @@ -664,23 +663,6 @@ static void record_times(struct psi_group_cpu *groupc, int cpu, groupc->times[PSI_MEM_SOME] += delta; if (groupc->state_mask & (1 << PSI_MEM_FULL)) groupc->times[PSI_MEM_FULL] += delta; - else if (memstall_tick) { - u32 sample; - /* - * Since we care about lost potential, a - * memstall is FULL when there are no other - * working tasks, but also when the CPU is - * actively reclaiming and nothing productive - * could run even if it were runnable. - * - * When the timer tick sees a reclaiming CPU, - * regardless of runnable tasks, sample a FULL - * tick (or less if it hasn't been a full tick - * since the last state change). - */ - sample = min(delta, (u32)jiffies_to_nsecs(1)); - groupc->times[PSI_MEM_FULL] += sample; - } } if (groupc->state_mask & (1 << PSI_CPU_SOME)) { @@ -714,7 +696,7 @@ static void psi_group_change(struct psi_group *group, int cpu, */ write_seqcount_begin(&groupc->seq); - record_times(groupc, cpu, false); + record_times(groupc, cpu); for (t = 0, m = clear; m; m &= ~(1 << t), t++) { if (!(m & (1 << t))) @@ -738,6 +720,18 @@ static void psi_group_change(struct psi_group *group, int cpu, if (test_state(groupc->tasks, s)) state_mask |= (1 << s); } + + /* + * Since we care about lost potential, a memstall is FULL + * when there are no other working tasks, but also when + * the CPU is actively reclaiming and nothing productive + * could run even if it were runnable. So when the current + * task in a cgroup is in_memstall, the corresponding groupc + * on that cpu is in PSI_MEM_FULL state. + */ + if (groupc->tasks[NR_ONCPU] && cpu_curr(cpu)->in_memstall) + state_mask |= (1 << PSI_MEM_FULL); + groupc->state_mask = state_mask; write_seqcount_end(&groupc->seq); @@ -823,17 +817,21 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, void *iter; if (next->pid) { + bool identical_state; + psi_flags_change(next, 0, TSK_ONCPU); /* - * When moving state between tasks, the group that - * contains them both does not change: we can stop - * updating the tree once we reach the first common - * ancestor. Iterate @next's ancestors until we - * encounter @prev's state. + * When switching between tasks that have an identical + * runtime state, the cgroup that contains both tasks + * runtime state, the cgroup that contains both tasks + * we reach the first common ancestor. Iterate @next's + * ancestors only until we encounter @prev's ONCPU. */ + identical_state = prev->psi_flags == next->psi_flags; iter = NULL; while ((group = iterate_groups(next, &iter))) { - if (per_cpu_ptr(group->pcpu, cpu)->tasks[NR_ONCPU]) { + if (identical_state && + per_cpu_ptr(group->pcpu, cpu)->tasks[NR_ONCPU]) { common = group; break; } @@ -859,21 +857,6 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, } } -void psi_memstall_tick(struct task_struct *task, int cpu) -{ - struct psi_group *group; - void *iter = NULL; - - while ((group = iterate_groups(task, &iter))) { - struct psi_group_cpu *groupc; - - groupc = per_cpu_ptr(group->pcpu, cpu); - write_seqcount_begin(&groupc->seq); - record_times(groupc, cpu, true); - write_seqcount_end(&groupc->seq); - } -} - /** * psi_memstall_enter - mark the beginning of a memory stall section * @flags: flags to handle nested sections diff --git a/kernel/sched/stats.h b/kernel/sched/stats.h index 33d0daf..9e4e67a 100644 --- a/kernel/sched/stats.h +++ b/kernel/sched/stats.h @@ -144,14 +144,6 @@ static inline void psi_sched_switch(struct task_struct *prev, psi_task_switch(prev, next, sleep); } -static inline void psi_task_tick(struct rq *rq) -{ - if (static_branch_likely(&psi_disabled)) - return; - - if (unlikely(rq->curr->in_memstall)) - psi_memstall_tick(rq->curr, cpu_of(rq)); -} #else /* CONFIG_PSI */ static inline void psi_enqueue(struct task_struct *p, bool wakeup) {} static inline void psi_dequeue(struct task_struct *p, bool sleep) {} @@ -159,7 +151,6 @@ static inline void psi_ttwu_dequeue(struct task_struct *p) {} static inline void psi_sched_switch(struct task_struct *prev, struct task_struct *next, bool sleep) {} -static inline void psi_task_tick(struct rq *rq) {} #endif /* CONFIG_PSI */ #ifdef CONFIG_SCHED_INFO