Received: by 2002:a05:6359:c8b:b0:c7:702f:21d4 with SMTP id go11csp3588180rwb; Sun, 9 Oct 2022 07:38:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7nHIQs5+bpYoYe2TkfR7gbTHOD/reIE1Ty7cfTRD5YNcpTUSvgifCozJfSLbZeIVAt5rEL X-Received: by 2002:a17:906:6791:b0:78d:4051:fcf0 with SMTP id q17-20020a170906679100b0078d4051fcf0mr11120831ejp.591.1665326336489; Sun, 09 Oct 2022 07:38:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665326336; cv=none; d=google.com; s=arc-20160816; b=u7fYTU3POYRP9dmO+Usr4XYo8sJwAez4i5RbrM6dSUZr79r5cSXoDDW92mxUpkVkvI Pn1I8Dl8D4tDXkpEXsM8F3MYgKeNPyWhdCI6jWjOS1KsYCzTz2U10Fvkt1hCDFBYd0gs zyT983iKSGXrlpfbTVmVx/ZFu+f1eXWvi1TCwIQj5GAoPHC2CMwPle1UUlDSQBOwI/Gm +DkE67kmda204kL1N59ik018/OYjs7aVeLJX6aUMXqOhXEiw/HwMy6E91XEMRrhZKKQi StOFjY50hh7U3wdfMcDFar9Ope26jeufvKWeqI5ggW1K319AaTRAWSEgu+VRLfCQJHEV LN7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:references :cc:to:from:content-language:subject:user-agent:mime-version:date :message-id:dkim-signature; bh=o8DJS0Bp8z45upCAfsSfSsEEeWWUheV9TXrd7HE4E5Q=; b=nbs25Aw0oHuwAG9ehDDkXRiBACB+YeRgN6fhjWrewpZnN+z4g7uNhw8OQXmrgcXtuE Ym8xYCXGJ1mCuGD48XaG6XHBb1WDnCVxzI3FfdR4ey3mk0vsgKNHqwaMhKqoFWgdqxON 5Y0ZTkBe6FvaYi4kyVfHYwa4PJkWoJ9XII2OwMW+VhDKIfwxytzMbDlvIYXqQjKW6NND 3pbFn5IDokqYbyxjleN3N9CVwi7f/hcopunVA0wC/mDcuVfgP+Arduc+suRqUeNza4/r hzaGRHSY7rprfpxnY1aEWuhVDrhhf79H5ARG2efmHoIV9HeeJsJ9LqPIx1/XvDv12BiX 20Sw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=HabQFMpe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id xc4-20020a170907074400b0078d9c0e8971si3714650ejb.752.2022.10.09.07.38.30; Sun, 09 Oct 2022 07:38:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=HabQFMpe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230003AbiJINRo (ORCPT + 99 others); Sun, 9 Oct 2022 09:17:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229749AbiJINRm (ORCPT ); Sun, 9 Oct 2022 09:17:42 -0400 Received: from mail-pl1-x633.google.com (mail-pl1-x633.google.com [IPv6:2607:f8b0:4864:20::633]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77338237F3 for ; Sun, 9 Oct 2022 06:17:40 -0700 (PDT) Received: by mail-pl1-x633.google.com with SMTP id l4so8303685plb.8 for ; Sun, 09 Oct 2022 06:17:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=o8DJS0Bp8z45upCAfsSfSsEEeWWUheV9TXrd7HE4E5Q=; b=HabQFMpeWkb0G9twyGKYTxVx3VK2Y0/WFt2RsmoBIsBD6hPeQylWAWegVnix/khHbi gOrS2QCBvW2qK5mQ2Q6R4zhju5Xli6bItriGACBPMtq5pujMzxEvkROCVd2EGeUQhwrg Z+Rl49ecGsI/JbmleX02H77azPRiQZOqHOWqL6Kzw+hejCkOf6AecvQmlegwua2AAZwA aIp4Df7dKKJDn5H9fODP68s1Pi3rSA+BybbK82HeJKVf46IjMktDe3XVds4ZOQV7QDhL uHo1jYQfLH1jHhfxQHorpl18sS+gfXa+EiL3AC25/oUk6CRNggNrHl6fKgiGqtIUuGAi Q+ag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:cc:to:from :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=o8DJS0Bp8z45upCAfsSfSsEEeWWUheV9TXrd7HE4E5Q=; b=Il6EfUFlNdwUVeMZUjyeGWqqWh1jWwm6umN44Qw+2fEO7WPWUH6QN6WlQyNR0cCtPZ IjOR3zrxaehpu4A0NxhyHN9sU0d4nUzK2Wirfx4OPTCKZrOMmNhX0SDJRJtBW0XSPOmm OcfgbkYWnUJCCGxbylbwPI+qlaJ/ihBBAi+FH7jZdqFL7dyLY651QXtnzxTfT9ThvQ5s 8e3tonpga0XvoZZw/qRG/s19R7SYbbS8kCfSxyZayGh8ZFTZbgCB+i3EjY3G5y3MlIJj HlUzqGJKLVF0TErO8NoNxf9irKxg+AyZIE9XPWkz1QcxBRJ2mix2cFreEjxoDurE5Bne LDTQ== X-Gm-Message-State: ACrzQf0OPQjxuasH4UjfbtTe1hBIeB9pj5bgYcaSLiqFknEpF3AEjKSO ZUHlO1lVB/FEzx/IyP0VDRxhzA== X-Received: by 2002:a17:902:b945:b0:181:c6b6:abc with SMTP id h5-20020a170902b94500b00181c6b60abcmr3524401pls.75.1665321459931; Sun, 09 Oct 2022 06:17:39 -0700 (PDT) Received: from [10.70.253.98] ([139.177.225.246]) by smtp.gmail.com with ESMTPSA id g5-20020aa796a5000000b00561d79f1064sm4958765pfk.57.2022.10.09.06.17.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 09 Oct 2022 06:17:38 -0700 (PDT) Message-ID: Date: Sun, 9 Oct 2022 21:17:34 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 Subject: Re: PSI idle-shutoff Content-Language: en-US From: Chengming Zhou To: Suren Baghdasaryan , Pavan Kondeti , Johannes Weiner Cc: LKML , Charan Teja Kalla References: <20220913140817.GA9091@hu-pkondeti-hyd.qualcomm.com> <20220915062027.GA14713@hu-pkondeti-hyd.qualcomm.com> <43f4d1c3-52fe-5254-7d50-c420de6d11a6@bytedance.com> In-Reply-To: <43f4d1c3-52fe-5254-7d50-c420de6d11a6@bytedance.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,NICE_REPLY_A,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/10/9 20:41, Chengming Zhou wrote: > Hello, > > I just saw these emails, sorry for late. > > On 2022/10/6 00:32, Suren Baghdasaryan wrote: >> On Sun, Oct 2, 2022 at 11:11 PM Suren Baghdasaryan wrote: >>> >>> On Fri, Sep 16, 2022 at 10:45 PM Suren Baghdasaryan wrote: >>>> >>>> On Wed, Sep 14, 2022 at 11:20 PM Pavan Kondeti >>>> wrote: >>>>> >>>>> On Tue, Sep 13, 2022 at 07:38:17PM +0530, Pavan Kondeti wrote: >>>>>> Hi >>>>>> >>>>>> The fact that psi_avgs_work()->collect_percpu_times()->get_recent_times() >>>>>> run from a kworker thread, PSI_NONIDLE condition would be observed as >>>>>> there is a RUNNING task. So we would always end up re-arming the work. >>>>>> >>>>>> If the work is re-armed from the psi_avgs_work() it self, the backing off >>>>>> logic in psi_task_change() (will be moved to psi_task_switch soon) can't >>>>>> help. The work is already scheduled. so we don't do anything there. >>>> >>>> Hi Pavan, >>>> Thanks for reporting the issue. IIRC [1] was meant to fix exactly this >>>> issue. At the time it was written I tested it and it seemed to work. >>>> Maybe I missed something or some other change introduced afterwards >>>> affected the shutoff logic. I'll take a closer look next week when I'm >>>> back at my computer and will consult with Johannes. >>> >>> Sorry for the delay. I had some time to look into this and test psi >>> shutoff on my device and I think you are right. The patch I mentioned >>> prevents new psi_avgs_work from being scheduled when the only non-idle >>> task is psi_avgs_work itself, however the regular 2sec averaging work >>> will still go on. I think we could record the fact that the only >>> active task is psi_avgs_work in record_times() using a new >>> psi_group_cpu.state_mask flag and then prevent psi_avgs_work() from >>> rescheduling itself if that flag is set for all non-idle cpus. I'll >>> test this approach and will post a patch for review if that works. >> >> Hi Pavan, >> Testing PSI shutoff on Android proved more difficult than I expected. >> Lots of tasks to silence and I keep encountering new ones. >> The approach I was thinking about is something like this: >> >> --- >> include/linux/psi_types.h | 3 +++ >> kernel/sched/psi.c | 12 +++++++++--- >> 2 files changed, 12 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h >> index c7fe7c089718..8d936f22cb5b 100644 >> --- a/include/linux/psi_types.h >> +++ b/include/linux/psi_types.h >> @@ -68,6 +68,9 @@ enum psi_states { >> NR_PSI_STATES = 7, >> }; >> >> +/* state_mask flag to keep re-arming averaging work */ >> +#define PSI_STATE_WAKE_CLOCK (1 << NR_PSI_STATES) >> + >> enum psi_aggregators { >> PSI_AVGS = 0, >> PSI_POLL, >> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c >> index ecb4b4ff4ce0..dd62ad28bacd 100644 >> --- a/kernel/sched/psi.c >> +++ b/kernel/sched/psi.c >> @@ -278,6 +278,7 @@ static void get_recent_times(struct psi_group >> *group, int cpu, >> if (delta) >> *pchanged_states |= (1 << s); >> } >> + *pchanged_states |= (state_mask & PSI_STATE_WAKE_CLOCK); > > If the avgs_work kworker is running on this CPU, it will still see > PSI_STATE_WAKE_CLOCK set in state_mask? So the work will be re-armed? > > Maybe I missed something... but I have another different idea which > I want to implement later only for discussion. diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index ee2ecc081422..f322e8fd8d41 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -241,11 +241,13 @@ static void get_recent_times(struct psi_group *group, int cpu, enum psi_aggregators aggregator, u32 *times, u32 *pchanged_states) { + int current_cpu = raw_smp_processor_id(); struct psi_group_cpu *groupc = per_cpu_ptr(group->pcpu, cpu); u64 now, state_start; enum psi_states s; unsigned int seq; u32 state_mask; + bool only_avgs_work = false; *pchanged_states = 0; @@ -256,6 +258,14 @@ static void get_recent_times(struct psi_group *group, int cpu, memcpy(times, groupc->times, sizeof(groupc->times)); state_mask = groupc->state_mask; state_start = groupc->state_start; + /* + * This CPU has only avgs_work kworker running, snapshot the + * newest times then don't need to re-arm work for this groupc. + * Normally this kworker will sleep soon and won't + * wake_clock in psi_group_change(). + */ + if (current_cpu == cpu && groupc->tasks[NR_RUNNING] == 1) + only_avgs_work = true; } while (read_seqcount_retry(&groupc->seq, seq)); /* Calculate state time deltas against the previous snapshot */ @@ -280,6 +290,10 @@ static void get_recent_times(struct psi_group *group, int cpu, if (delta) *pchanged_states |= (1 << s); } + + /* Clear PSI_NONIDLE so avgs_work won't be re-armed for this groupc */ + if (only_avgs_work) + *pchanged_states &= ~(1 << PSI_NONIDLE); } static void calc_avgs(unsigned long avg[3], int missed_periods, > > Thanks. > >> } >> >> static void calc_avgs(unsigned long avg[3], int missed_periods, >> @@ -413,7 +414,7 @@ static void psi_avgs_work(struct work_struct *work) >> struct delayed_work *dwork; >> struct psi_group *group; >> u32 changed_states; >> - bool nonidle; >> + bool wake_clock; >> u64 now; >> >> dwork = to_delayed_work(work); >> @@ -424,7 +425,7 @@ static void psi_avgs_work(struct work_struct *work) >> now = sched_clock(); >> >> collect_percpu_times(group, PSI_AVGS, &changed_states); >> - nonidle = changed_states & (1 << PSI_NONIDLE); >> + wake_clock = changed_states & PSI_STATE_WAKE_CLOCK; >> /* >> * If there is task activity, periodically fold the per-cpu >> * times and feed samples into the running averages. If things >> @@ -435,7 +436,7 @@ static void psi_avgs_work(struct work_struct *work) >> if (now >= group->avg_next_update) >> group->avg_next_update = update_averages(group, now); >> >> - if (nonidle) { >> + if (wake_clock) { >> schedule_delayed_work(dwork, nsecs_to_jiffies( >> group->avg_next_update - now) + 1); >> } >> @@ -742,6 +743,11 @@ static void psi_group_change(struct psi_group >> *group, int cpu, >> if (unlikely(groupc->tasks[NR_ONCPU] && cpu_curr(cpu)->in_memstall)) >> state_mask |= (1 << PSI_MEM_FULL); >> >> + if (wake_clock || test_state(groupc->tasks, PSI_NONIDLE)) { >> + /* psi_avgs_work was not the only task on the CPU */ >> + state_mask |= PSI_STATE_WAKE_CLOCK; >> + } >> + >> groupc->state_mask = state_mask; >> >> write_seqcount_end(&groupc->seq);