Received: by 2002:a05:6a10:8395:0:0:0:0 with SMTP id n21csp694969pxh; Tue, 9 Nov 2021 17:39:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJxnLbHD/QKzNU6kKmpl838KYu5njML4BOCV2TRxRmLVz82YgqySnDmIsXuqullUGzmICFoY X-Received: by 2002:a02:c761:: with SMTP id k1mr9394837jao.74.1636508365196; Tue, 09 Nov 2021 17:39:25 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1636508365; cv=none; d=google.com; s=arc-20160816; b=VusK7ShFwdMOhilnBQgetF8wjt40ruDFJzVQg8MOVa6ijIf5hGbzxNHT8YT59zqEBa 8OvoOZWTrAWWvK+3VmbZ467HfQpFt9S0pz0lgrmra8DFjjVFc0ptJ7V2N5G1LHt2U17V tPChfO7wvB5urE6FFjH7p56El7GxCBQgdc+aI15P86DEMwo7GtF5NEzNyvNroZYsJ66j LYRXYaD8ybG8gAM7hveqS/hcwhiX72CizBsUrhuE8cR7NX9ndTVmrcZCuRig1UJTFG8c o7A32Ix+G9qOWSAOw2VFDr5qYT6BQoTU5cH13P4q7Rc7H61O8w4z8a2Qo0RYzwUK4jWJ bagw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mFSvIXLvuM5WjGRK19uU3La2y2+mphjkkQvNxDEReck=; b=lZ8BqtSyYHEbf6QvVRyD0FnKbA3vS6dvQIabC1EIMmMbNZLKNyHf7Eq/HgsppzCmCO RGBp4zH+JZO2OeFrMM8pDQKPA+cWFJhgYMU2eiGarNMdISQDXNd9X+/ZwX/EGgJ3dZLM Ghz+KdGnWFW7Uyphq+up30KVQpkUyUBx8Ogeoxcw3IrA905qvRbtxjnTFfye6rChdHJo vG9/rM68W+68yX+Ld9OKIS6ubX6ZXQgxzlz9OcsQQKOIrD4B/HuwTToGeu0RSnmjBUjq PJRH/MMVh5+yEpoMta3VumSN6XhtZCtc/TyY4/DR+93nKT4KorVkaA7hPiEIg12S2vPS tI4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MjbowGEk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id h3si39311246ili.128.2021.11.09.17.39.11; Tue, 09 Nov 2021 17:39:25 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=MjbowGEk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229568AbhKJBkJ (ORCPT + 99 others); Tue, 9 Nov 2021 20:40:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229513AbhKJBkI (ORCPT ); Tue, 9 Nov 2021 20:40:08 -0500 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29815C061764 for ; Tue, 9 Nov 2021 17:37:22 -0800 (PST) Received: by mail-qt1-x835.google.com with SMTP id j17so712838qtx.2 for ; Tue, 09 Nov 2021 17:37:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mFSvIXLvuM5WjGRK19uU3La2y2+mphjkkQvNxDEReck=; b=MjbowGEkIBxR3fNlzyzLDUyz/7KxvRAEMfqE+K5RzoWnB2aX/RYLRgWb7ocRe7I5jb uUQnpbiV37I2O63es1OsESl8qEcdSlG3YKo2k5SOoE2LMZLvbqu6PBZZZbJRSPezUDET Dj/1HmKaU5gYinTY7QvPwwjUhDpjHxPXIKSCad1ygzXLIKIU3kBlUVEAbNhwyqNUxVIg 0OfD+yEvQZtDeoyggjC/RvTlCl3P5qWY3Q+8Tq9BTMoDG9AcMvYyj/KodABo0kll2EYE J5YXc2ncolES1hwRPW5HPI/2PlAgXWO+Xm17SJcqUDuPF3K/glgrDzfv2ANCXfDuhk4d En7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mFSvIXLvuM5WjGRK19uU3La2y2+mphjkkQvNxDEReck=; b=rE9reBNqKf4TOJNz4jwoKcn7UFtGaAwsr4jQdBkwVOwNtnU5fIK+iO14nPxjuslPYe eEBuOyuKtvkzGJiYb11klWlnsGJtEY2cl4KtEBG+2lHZiYwWmzlbgeIiXq2fIcvHUWKD Tz0a8KmjXDu5RRwsdMRucwuje50ZfYOFatG8Uo39uU3AAKTadm7xiRjBSKw1tZNHrfKh s5WYPdNZJ8s6xMbYJpMdWeqhmrNKiaNGSMv2dHYPGkYRWbuXsUGqDHi5Ro5AY9/fwmfn X1HRZkREwQggmtnePgJV9mYsX9k7FG25Z5xRCD7Jx0+osT4fNdd1X90te28PVTWJ46nZ V1BQ== X-Gm-Message-State: AOAM5317Nd5+BDMW5Yw8jBIUtol4TLySWiPFELzAkzCfpwNtEWBPNUuP uw7zzNpNIjSQTPlIUCknKZCSYhgia2ye4E9aLoY= X-Received: by 2002:ac8:580b:: with SMTP id g11mr13279957qtg.272.1636508241235; Tue, 09 Nov 2021 17:37:21 -0800 (PST) MIME-Version: 1.0 References: <1634278612-17055-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Zhaoyang Huang Date: Wed, 10 Nov 2021 09:37:00 +0800 Message-ID: Subject: Re: [Resend PATCH] psi : calc cfs task memstall time more precisely To: Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Ke Wang , xuewen.yan@unisoc.com Cc: Johannes Weiner , Andrew Morton , Michal Hocko , Vladimir Davydov , Zhaoyang Huang , "open list:MEMORY MANAGEMENT" , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 9, 2021 at 10:56 PM Peter Zijlstra wrote: > > On Tue, Nov 02, 2021 at 03:47:33PM -0400, Johannes Weiner wrote: > > CC peterz as well for rt and timekeeping magic > > > > On Fri, Oct 15, 2021 at 02:16:52PM +0800, Huangzhaoyang wrote: > > > From: Zhaoyang Huang > > > > > > In an EAS enabled system, there are two scenarios discordant to current design, > > > > > > 1. workload used to be heavy uneven among cores for sake of scheduler policy. > > > RT task usually preempts CFS task in little core. > > > 2. CFS task's memstall time is counted as simple as exit - entry so far, which > > > ignore the preempted time by RT, DL and Irqs. > > It ignores preemption full-stop. I don't see why RT/IRQ should be > special cased here. As Johannes comments, what we are trying to solve is mainly the preempted time of the CFS task by RT/IRQ, NOT the RT/IRQ themselves. Could you please catch up the recent reply of Dietmar, which maybe provide more information. > > > > With these two constraints, the percpu nonidle time would be mainly consumed by > > > none CFS tasks and couldn't be averaged. Eliminating them by calc the time growth > > > via the proportion of cfs_rq's utilization on the whole rq. > > > > > +static unsigned long psi_memtime_fixup(u32 growth) > > > +{ > > > + struct rq *rq = task_rq(current); > > > + unsigned long growth_fixed = (unsigned long)growth; > > > + > > > + if (!(current->policy == SCHED_NORMAL || current->policy == SCHED_BATCH)) > > > + return growth_fixed; > > > + > > > + if (current->in_memstall) > > > + growth_fixed = div64_ul((1024 - rq->avg_rt.util_avg - rq->avg_dl.util_avg > > > + - rq->avg_irq.util_avg + 1) * growth, 1024); > > > + > > > + return growth_fixed; > > > +} > > > + > > > static void init_triggers(struct psi_group *group, u64 now) > > > { > > > struct psi_trigger *t; > > > @@ -658,6 +675,7 @@ static void record_times(struct psi_group_cpu *groupc, u64 now) > > > } > > > > > > if (groupc->state_mask & (1 << PSI_MEM_SOME)) { > > > + delta = psi_memtime_fixup(delta); > > > > Ok, so we want to deduct IRQ and RT preemption time from the memstall > > period of an active reclaimer, since it's technically not stalled on > > memory during this time but on CPU. > > > > However, we do NOT want to deduct IRQ and RT time from memstalls that > > are sleeping on refaults swapins, since they are not affected by what > > is going on on the CPU. > > I think that focus on RT/IRQ is mis-guided here, and the implementation > is horrendous. > > So the fundamental question seems to be; and I think Johannes is the one > to answer that: What time-base do these metrics want to use? > > Do some of these states want to account in task-time instead of > wall-time perhaps? I can't quite remember, but vague memories are > telling me most of the PSI accounting was about blocked tasks, not > running tasks, which makes all this rather more complicated. memstall time is counted as exit - enter, which include both blocked and running stat. However, we think the blocked time introduced by preemption of RT/IRQ/DL are memstall irrelevant(should be eliminated), while the ones between CFS tasks could be. Thanks for the mechanism of load tracking, the implementation could be simple by calculating the proportion of CFS_UTIL among the whole core's capacity. > > Randomly scaling time as proposed seems almost certainly wrong. What > would that make the stats mean? It is NOT randomly scaling, but scales in each record_times for CFS tasks.