Received: by 2002:a05:6358:45e:b0:b5:b6eb:e1f9 with SMTP id 30csp845679rwe; Thu, 25 Aug 2022 10:05:52 -0700 (PDT) X-Google-Smtp-Source: AA6agR6SGXCrellPB3X5imdVRhb3bYXbv51Z8PveUYmPWWSiEYYATtkCQYBCMdHUUivWz3dCBLUS X-Received: by 2002:a17:903:200b:b0:173:bb90:3d50 with SMTP id s11-20020a170903200b00b00173bb903d50mr50773pla.41.1661447151310; Thu, 25 Aug 2022 10:05:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661447151; cv=none; d=google.com; s=arc-20160816; b=crJojELQ/FalOSFLR3ALwfWPe5GzNt+WgXV+8wOigKktskTVJPXpPa41j9AH12rcSt lQTtklXCxiWIKI3HdNKEGCrOM7aO//oQMoWV08UpS8YNL6g6r8L6qTIX4dSZSXlKp0fm 5ygAAL2bDJh8T4qCRWWQAiXW/40eCPqZpIWJwqfklAj9hUE5iwxPJVbYbuFpasu04LOE TFhB2uZz+63oCms4NpirP7GUMBFsBzT9+KT2GqKaS3MOmV5+NVHuxcjF1wGSxpWdDvtv fuSSmUNxu9ljIkJQg2UFQeZ/9ipw8g/RZ4f7FRNnV0rLKhdEf7ZQtN9J2U4CNMAoSwoR jv7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UFhlT/gs8qtiRYHvtb1wxRsFIT23vDXVebdDoZhlAjg=; b=KRVa166jRbNb9VMpjfNsIgHG0vqtNyt0W7fyEAAcP1xRoLolF0Sw34XtctojCJpdpx Gb3OiUbBjOvQbSyk1MT6Jv65T16bz5fgWMRqJnMgEiliLN9t8C/POWAzKSOsJsdDUk7N rOPR0jRJWHESXJU+bEJvzspYC15mSWi3S3FPMlCfCjhkzP2qCXZDX1gWHlXVWROcv9sl mF4fek1gE/WhrR3f34iUdxcii5JSw05UXUuJYSvyTcmb9klX32oZlAx8muU6cuDMfic1 v5y9Q4J7F9q4d+8t+9zLiPMIsJVU5ByBk4WO13T191z1W7yRnlQ2Ck5Gwb1Sq73lyODS ml/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UObCQT5a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020a170902ce0900b0016db1b1a7adsi22182424plg.210.2022.08.25.10.05.23; Thu, 25 Aug 2022 10:05:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=UObCQT5a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243273AbiHYQo5 (ORCPT + 99 others); Thu, 25 Aug 2022 12:44:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41448 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243052AbiHYQoW (ORCPT ); Thu, 25 Aug 2022 12:44:22 -0400 Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8276BBC811 for ; Thu, 25 Aug 2022 09:43:08 -0700 (PDT) Received: by mail-pj1-x1035.google.com with SMTP id n65-20020a17090a5ac700b001fbb4fad865so2169227pji.1 for ; Thu, 25 Aug 2022 09:43:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc; bh=UFhlT/gs8qtiRYHvtb1wxRsFIT23vDXVebdDoZhlAjg=; b=UObCQT5aWfLF8IEqncH41izV8LTkz40zhfWhjz/dS3a24UFubDaB0qVjPdXaK/LHcK 8l8jEkLJD1FJiP/IHCUTT5l6V9Y3XPziOQZDWL+RR8mADkfbPPz0Rx5ia/HK/WVwGjKP kGjS2V+oxGoepYeoHwUOzeJpPh8OjlQSU0Mp3rw4NzA6wdVrcTIdK7oW5HeDn2LA0seI 85uJTAAMIgLFp1Ge5WTjpCjvkjDtxFBDo+wvd1W75E4icUktKlvfHpIPMOOVN/P9MGbx QBWAo7sFwgriSMr6XWdJCgDho6cITVQrGJwCh7BCR6xYx9eUcqfWpo4uz1nwxaSn0j9Z U8aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc; bh=UFhlT/gs8qtiRYHvtb1wxRsFIT23vDXVebdDoZhlAjg=; b=U+xJWQh5yi957dxg4ZuyOTnTz0/CW/fD8T3Pfktn/IpnsJ41VvnLVfm2AR+HL4Tc3/ R6etfumCG2g5+ohc3awD/oPO6tSdh7gJNIfe2lQL662fDDuc/0S/qS+vE1zV0c5/+n6z lvxB+NteTa6O9pMoGNAOkWWeZv5xuZo1S0xnVtsydNXg5J0UpG3QBhxpAZ6LeybJ/XXf OJvXIUcfiP0WGBc6pW2ji75kF3QIM/8D9silpwom9Eqp+yI6zjn/U4DBWmEBLhd6PKpq 5K3jgf3qjCeMY8MFRA4V/b5Olle2HazhWkIUmZhvGVOw3eOY+FXf7eVLPXcvxccqvHEH 18WA== X-Gm-Message-State: ACgBeo0q1TRm3TtWULPdHUirj9d7fzU2c0ByDpXAQ8oQDBtzYQBF5iaN 5ibnotwELrARJxLVY9+pXvYUng== X-Received: by 2002:a17:90b:2643:b0:1fb:707d:af1f with SMTP id pa3-20020a17090b264300b001fb707daf1fmr12959873pjb.7.1661445787253; Thu, 25 Aug 2022 09:43:07 -0700 (PDT) Received: from localhost.localdomain ([139.177.225.236]) by smtp.gmail.com with ESMTPSA id b18-20020a62a112000000b005362314bf80sm12779408pff.67.2022.08.25.09.43.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Aug 2022 09:43:06 -0700 (PDT) From: Chengming Zhou To: hannes@cmpxchg.org, tj@kernel.org, mkoutny@suse.com, surenb@google.com Cc: mingo@redhat.com, peterz@infradead.org, gregkh@linuxfoundation.org, corbet@lwn.net, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, songmuchun@bytedance.com, Chengming Zhou Subject: [PATCH v4 06/10] sched/psi: remove NR_ONCPU task accounting Date: Fri, 26 Aug 2022 00:41:07 +0800 Message-Id: <20220825164111.29534-7-zhouchengming@bytedance.com> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20220825164111.29534-1-zhouchengming@bytedance.com> References: <20220825164111.29534-1-zhouchengming@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Johannes Weiner We put all fields updated by the scheduler in the first cacheline of struct psi_group_cpu for performance. Since we want add another PSI_IRQ_FULL to track IRQ/SOFTIRQ pressure, we need to reclaim space first. This patch remove NR_ONCPU task accounting in struct psi_group_cpu, use one bit in state_mask to track instead. Signed-off-by: Johannes Weiner Signed-off-by: Chengming Zhou Reviewed-by: Chengming Zhou Tested-by: Chengming Zhou --- include/linux/psi_types.h | 16 +++++++-------- kernel/sched/psi.c | 41 ++++++++++++++++++++++++++++----------- 2 files changed, 37 insertions(+), 20 deletions(-) diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index c7fe7c089718..54cb74946db4 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -15,13 +15,6 @@ enum psi_task_count { NR_IOWAIT, NR_MEMSTALL, NR_RUNNING, - /* - * This can't have values other than 0 or 1 and could be - * implemented as a bit flag. But for now we still have room - * in the first cacheline of psi_group_cpu, and this way we - * don't have to special case any state tracking for it. - */ - NR_ONCPU, /* * For IO and CPU stalls the presence of running/oncpu tasks * in the domain means a partial rather than a full stall. @@ -32,16 +25,18 @@ enum psi_task_count { * threads and memstall ones. */ NR_MEMSTALL_RUNNING, - NR_PSI_TASK_COUNTS = 5, + NR_PSI_TASK_COUNTS = 4, }; /* Task state bitmasks */ #define TSK_IOWAIT (1 << NR_IOWAIT) #define TSK_MEMSTALL (1 << NR_MEMSTALL) #define TSK_RUNNING (1 << NR_RUNNING) -#define TSK_ONCPU (1 << NR_ONCPU) #define TSK_MEMSTALL_RUNNING (1 << NR_MEMSTALL_RUNNING) +/* Only one task can be scheduled, no corresponding task count */ +#define TSK_ONCPU (1 << NR_PSI_TASK_COUNTS) + /* Resources that workloads could be stalled on */ enum psi_res { PSI_IO, @@ -68,6 +63,9 @@ enum psi_states { NR_PSI_STATES = 7, }; +/* Use one bit in the state mask to track TSK_ONCPU */ +#define PSI_ONCPU (1 << NR_PSI_STATES) + enum psi_aggregators { PSI_AVGS = 0, PSI_POLL, diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index d71dbc2356ff..4702a770e272 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -212,7 +212,7 @@ void __init psi_init(void) group_init(&psi_system); } -static bool test_state(unsigned int *tasks, enum psi_states state) +static bool test_state(unsigned int *tasks, enum psi_states state, bool oncpu) { switch (state) { case PSI_IO_SOME: @@ -225,9 +225,9 @@ static bool test_state(unsigned int *tasks, enum psi_states state) return unlikely(tasks[NR_MEMSTALL] && tasks[NR_RUNNING] == tasks[NR_MEMSTALL_RUNNING]); case PSI_CPU_SOME: - return unlikely(tasks[NR_RUNNING] > tasks[NR_ONCPU]); + return unlikely(tasks[NR_RUNNING] > oncpu); case PSI_CPU_FULL: - return unlikely(tasks[NR_RUNNING] && !tasks[NR_ONCPU]); + return unlikely(tasks[NR_RUNNING] && !oncpu); case PSI_NONIDLE: return tasks[NR_IOWAIT] || tasks[NR_MEMSTALL] || tasks[NR_RUNNING]; @@ -689,9 +689,9 @@ static void psi_group_change(struct psi_group *group, int cpu, bool wake_clock) { struct psi_group_cpu *groupc; - u32 state_mask = 0; unsigned int t, m; enum psi_states s; + u32 state_mask; groupc = per_cpu_ptr(group->pcpu, cpu); @@ -707,17 +707,36 @@ static void psi_group_change(struct psi_group *group, int cpu, record_times(groupc, now); + /* + * Start with TSK_ONCPU, which doesn't have a corresponding + * task count - it's just a boolean flag directly encoded in + * the state mask. Clear, set, or carry the current state if + * no changes are requested. + */ + if (unlikely(clear & TSK_ONCPU)) { + state_mask = 0; + clear &= ~TSK_ONCPU; + } else if (unlikely(set & TSK_ONCPU)) { + state_mask = PSI_ONCPU; + set &= ~TSK_ONCPU; + } else { + state_mask = groupc->state_mask & PSI_ONCPU; + } + + /* + * The rest of the state mask is calculated based on the task + * counts. Update those first, then construct the mask. + */ for (t = 0, m = clear; m; m &= ~(1 << t), t++) { if (!(m & (1 << t))) continue; if (groupc->tasks[t]) { groupc->tasks[t]--; } else if (!psi_bug) { - printk_deferred(KERN_ERR "psi: task underflow! cpu=%d t=%d tasks=[%u %u %u %u %u] clear=%x set=%x\n", + printk_deferred(KERN_ERR "psi: task underflow! cpu=%d t=%d tasks=[%u %u %u %u] clear=%x set=%x\n", cpu, t, groupc->tasks[0], groupc->tasks[1], groupc->tasks[2], - groupc->tasks[3], groupc->tasks[4], - clear, set); + groupc->tasks[3], clear, set); psi_bug = 1; } } @@ -726,9 +745,8 @@ static void psi_group_change(struct psi_group *group, int cpu, if (set & (1 << t)) groupc->tasks[t]++; - /* Calculate state mask representing active states */ for (s = 0; s < NR_PSI_STATES; s++) { - if (test_state(groupc->tasks, s)) + if (test_state(groupc->tasks, s, state_mask & PSI_ONCPU)) state_mask |= (1 << s); } @@ -740,7 +758,7 @@ static void psi_group_change(struct psi_group *group, int cpu, * task in a cgroup is in_memstall, the corresponding groupc * on that cpu is in PSI_MEM_FULL state. */ - if (unlikely(groupc->tasks[NR_ONCPU] && cpu_curr(cpu)->in_memstall)) + if (unlikely((state_mask & PSI_ONCPU) && cpu_curr(cpu)->in_memstall)) state_mask |= (1 << PSI_MEM_FULL); groupc->state_mask = state_mask; @@ -828,7 +846,8 @@ void psi_task_switch(struct task_struct *prev, struct task_struct *next, */ iter = NULL; while ((group = iterate_groups(next, &iter))) { - if (per_cpu_ptr(group->pcpu, cpu)->tasks[NR_ONCPU]) { + if (per_cpu_ptr(group->pcpu, cpu)->state_mask & + PSI_ONCPU) { common = group; break; } -- 2.37.2