Received: by 2002:a05:6358:4e97:b0:b3:742d:4702 with SMTP id ce23csp3619893rwb; Tue, 16 Aug 2022 06:17:46 -0700 (PDT) X-Google-Smtp-Source: AA6agR6sqN4UvI4OkktPnI+WlShtarTxd6uvv4mi/LXCTPPGLJbBBLlzAmh4WdnAKrc9ZVXO0Nyt X-Received: by 2002:a17:906:dc89:b0:731:67eb:b60b with SMTP id cs9-20020a170906dc8900b0073167ebb60bmr13094397ejc.614.1660655866016; Tue, 16 Aug 2022 06:17:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1660655866; cv=none; d=google.com; s=arc-20160816; b=vEQv/Xg/yeil+TAdss+sAugl8BN4WJDZikbpH3l1VLY1y7SNsHUqDhNeliaT4K4N8i qBddHcEujx+MsO2wOMdIifZVuVNm6UWPYP5/eT7G9OW8CZXpvEt7mToOFZAnHET/3bub Az3br/yt8AeC2tbkH0es6W7kbvboHUpQZYXimG1mvA41nK+1vKq8fzPKE0xQC8yMzbeF tWnKawcZgDxP0VBJvIrmSUa7jb1frlVO58kZFmlipK2UG+ES7L8L4XT+mnma6PcLP1KT EstKIvWROkzbKKsg6lrFaYm5B5OvUgknX9N+A82+VKu1Y3OVQdeKMCOvTGfNgQRSJD9r iLsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=+6Yo80pVz7R1wbuugd0h39oHv4cQk8t0dTgRXP4VwBI=; b=wCijFX1xFYroaZbWl4HQ3B6q6vgqNRqID7Xkxk7+lG9hByFQBi3LI+OOBCMDNgWtlV zKS6FsP1w/vm+MZUi0TwB07hvU+N9FyXg2NkjEpYlh9wjcL2azgmoA+EkMg+zRGiyUcQ /RDbDPi5QUA0Ez/zWdwvgo9Ot/tSTPr5wsZDs7nZgngyLp6yNDF4+lQsIvVWHLGQRd+m gOkGo3NVYGX0G4l7849WHeQTOnnt42gzyDKRxsjWP8T1EhOmHdhzj6Sj7C1BUiEYyYe+ huj2bnhCs8FsJkVQWetKteTh2JHxki1i8qioN6CmKbC6K4+9eW1Xxs7jtTCeA64Uvvxh Yb6w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Bm1NhmX8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n8-20020a05640205c800b0043a25094b50si11636981edx.552.2022.08.16.06.17.18; Tue, 16 Aug 2022 06:17:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=Bm1NhmX8; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235449AbiHPNGx (ORCPT + 99 others); Tue, 16 Aug 2022 09:06:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56620 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235008AbiHPNGe (ORCPT ); Tue, 16 Aug 2022 09:06:34 -0400 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B1571639D for ; Tue, 16 Aug 2022 06:06:32 -0700 (PDT) Received: by mail-pj1-x102b.google.com with SMTP id e8-20020a17090a280800b001f2fef7886eso9540675pjd.3 for ; Tue, 16 Aug 2022 06:06:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc; bh=+6Yo80pVz7R1wbuugd0h39oHv4cQk8t0dTgRXP4VwBI=; b=Bm1NhmX8UPNf1PtLuZHAgWfhESmLqjvmDF7oLG7Sjrob5ZBRioBQbZD8IbmfhUAuPy 26Pg0XP9e55a3u5kuQ5ryXTN1eqIwUYiGNTjeFnhmdbLM55bc0FxevMk8+WBBXcBYJT5 CC+fEv2AnVaO1iGODXOqJPnAoaTxd2O8NqgXkv2mZ6mBh9kYG0x+9EMHXgGOCIk3qAR1 n1LKoZW0wxTfg0Fcd1tVZGBrJarWd6ATeB6MjzrMVXM8Uq9zR3Tc0GPRV7NUamgreV0Y sIsQn/MuuGInwkkJosYPYxp8cPCoQUm4kieEJnct/L+ITnY3CRyDKA7UUzqyiZWg54Tf 3LGA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc; bh=+6Yo80pVz7R1wbuugd0h39oHv4cQk8t0dTgRXP4VwBI=; b=ETwr582AViLDMQRSErtR3tsW1NdwKvTXztnljnpOq1Dmkml2eiH0tubQ+IJAIuRqTY OnLoBJ6Wr21dWn0x25P7hKv5CNWse9NaFIppNSTCbd2HtBF9UNkXGosgcys/eBM4/Rlg l6W9GdperpLti4/axQ+esPoJv9rpHRocIZqNlJ3O2b2YIocYqW7ctxFtSdm3JI6tOoUZ XPMkiDuDtL8hlG9NSkBwp5IBfNp/d8S8XS7dCO3vDN699X55L0GOiyIETf5T6fAOKEb7 HoEKY45YHgtCv0KHVHlosFU0ymhCK3InIaF0oroMv3+oWz5HK8MaAknaZGZawX21YBxf 77Rw== X-Gm-Message-State: ACgBeo1uUDxT8NTyO5FqUGqfPn/MoZ2FA0iwhYjk08fCt63/Q3Nq/Lv2 /1Rl6sLNEMnUROp5C299TBsmeeYBuEZNqg== X-Received: by 2002:a17:902:d58d:b0:171:5880:3287 with SMTP id k13-20020a170902d58d00b0017158803287mr20604260plh.9.1660655192117; Tue, 16 Aug 2022 06:06:32 -0700 (PDT) Received: from [10.4.196.37] ([139.177.225.255]) by smtp.gmail.com with ESMTPSA id j15-20020a170902da8f00b0016f04c098ddsm9039093plx.226.2022.08.16.06.06.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 16 Aug 2022 06:06:31 -0700 (PDT) Message-ID: <904851a7-7b01-8689-3ec1-2a61f8244841@bytedance.com> Date: Tue, 16 Aug 2022 21:06:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.1.2 Subject: Re: [PATCH v2 09/10] sched/psi: per-cgroup PSI stats disable/re-enable interface Content-Language: en-US To: Johannes Weiner Cc: tj@kernel.org, corbet@lwn.net, surenb@google.com, mingo@redhat.com, peterz@infradead.org, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, songmuchun@bytedance.com References: <20220808110341.15799-1-zhouchengming@bytedance.com> <20220808110341.15799-10-zhouchengming@bytedance.com> From: Chengming Zhou In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/8/15 23:49, Johannes Weiner wrote: > On Mon, Aug 08, 2022 at 07:03:40PM +0800, Chengming Zhou wrote: >> +static ssize_t cgroup_psi_write(struct kernfs_open_file *of, >> + char *buf, size_t nbytes, loff_t off) >> +{ >> + ssize_t ret; >> + int enable; >> + struct cgroup *cgrp; >> + struct psi_group *psi; >> + >> + ret = kstrtoint(strstrip(buf), 0, &enable); >> + if (ret) >> + return ret; >> + >> + if (enable < 0 || enable > 1) >> + return -ERANGE; >> + >> + cgrp = cgroup_kn_lock_live(of->kn, false); >> + if (!cgrp) >> + return -ENOENT; >> + >> + psi = cgroup_ino(cgrp) == 1 ? &psi_system : &cgrp->psi; >> + psi_cgroup_enable(psi, enable); > > I think it should also add/remove the pressure files when enabling and > disabling the aggregation, since their contents would be stale and > misleading. > > Take a look at cgroup_add_dfl_cftypes() and cgroup_rm_cftypes() Ok, I will look. > >> @@ -5115,6 +5152,12 @@ static struct cftype cgroup_base_files[] = { >> .release = cgroup_pressure_release, >> }, >> #endif >> + { >> + .name = "cgroup.psi", >> + .flags = CFTYPE_PRESSURE, >> + .seq_show = cgroup_psi_show, >> + .write = cgroup_psi_write, >> + }, >> #endif /* CONFIG_PSI */ >> { } /* terminate */ >> }; >> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c >> index 58f8092c938f..9df1686ee02d 100644 >> --- a/kernel/sched/psi.c >> +++ b/kernel/sched/psi.c >> @@ -181,6 +181,7 @@ static void group_init(struct psi_group *group) >> { >> int cpu; >> >> + group->enabled = true; >> for_each_possible_cpu(cpu) >> seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq); >> group->avg_last_update = sched_clock(); >> @@ -700,17 +701,16 @@ static void psi_group_change(struct psi_group *group, int cpu, >> groupc = per_cpu_ptr(group->pcpu, cpu); >> >> /* >> - * First we assess the aggregate resource states this CPU's >> - * tasks have been in since the last change, and account any >> - * SOME and FULL time these may have resulted in. >> - * >> - * Then we update the task counts according to the state >> + * First we update the task counts according to the state >> * change requested through the @clear and @set bits. >> + * >> + * Then if the cgroup PSI stats accounting enabled, we >> + * assess the aggregate resource states this CPU's tasks >> + * have been in since the last change, and account any >> + * SOME and FULL time these may have resulted in. >> */ >> write_seqcount_begin(&groupc->seq); >> >> - record_times(groupc, now); >> - >> /* >> * Start with TSK_ONCPU, which doesn't have a corresponding >> * task count - it's just a boolean flag directly encoded in >> @@ -750,6 +750,14 @@ static void psi_group_change(struct psi_group *group, int cpu, >> if (set & (1 << t)) >> groupc->tasks[t]++; >> >> + if (!group->enabled) { >> + if (groupc->state_mask & (1 << PSI_NONIDLE)) >> + record_times(groupc, now); > > Why record the nonidle time? It's only used for aggregation, which is > stopped as well. I'm considering of this situation: disable at t2 and re-enable at t3 state1(t1) --> state2(t2) --> state3(t3) If aggregator has get_recent_times() in [t1, t2], groupc->times_prev[aggregator] will include that delta of (t - t1). Then re-enable at t3, the delta of (t3-t1) is discarded, may make that aggregator see times < groupc->times_prev[aggregator] ? Maybe I missed something, not sure whether this is a problem. > >> @@ -1088,6 +1097,23 @@ void cgroup_move_task(struct task_struct *task, struct css_set *to) >> >> task_rq_unlock(rq, task, &rf); >> } >> + >> +void psi_cgroup_enable(struct psi_group *group, bool enable) >> +{ >> + struct psi_group_cpu *groupc; >> + int cpu; >> + u64 now; >> + >> + if (group->enabled == enable) >> + return; >> + group->enabled = enable; >> + >> + for_each_possible_cpu(cpu) { >> + groupc = per_cpu_ptr(group->pcpu, cpu); >> + now = cpu_clock(cpu); >> + psi_group_change(group, cpu, 0, 0, now, true); > > This loop deserves a comment, IMO. I add some comments as below, could you help take a look? + +void psi_cgroup_enable(struct psi_group *group, bool enable) +{ + int cpu; + u64 now; + + if (group->enabled == enable) + return; + group->enabled = enable; + + /* + * We use psi_group_change() to disable or re-enable the + * record_times(), test_state() loop and averaging worker + * in each psi_group_cpu of the psi_group, use .clear = 0 + * and .set = 0 here since no task status really changed. + */ + for_each_possible_cpu(cpu) { + now = cpu_clock(cpu); + psi_group_change(group, cpu, 0, 0, now, true); + } +} Thanks!