Received: by 2002:a05:6a10:5bc5:0:0:0:0 with SMTP id os5csp4619355pxb; Tue, 2 Nov 2021 12:50:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxFHALDUfr3V50qAVbzQbnvr4LlwJAdiWbl41tFRG6qaRq1nG0lKWXvLIPI87rQsRHwWQJE X-Received: by 2002:a17:906:4452:: with SMTP id i18mr46093550ejp.374.1635882606533; Tue, 02 Nov 2021 12:50:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1635882606; cv=none; d=google.com; s=arc-20160816; b=Hlx0ikUfkZ0mml8gNCHt4wDFExahRUouczaesuK/V2QIPtvVhccYWLPTBN3RPTeKWW 62NmcW0GTYisWdOFJKbvciedylhawZYVdITW7ycj2hFr92hdjGYcoGrH0ndz+JQf3ac2 eRmM9SPcSwuavKBdnNTFgCI8SG+VV81JYKbSteM0BtFgfIDy5/JmYpVn3Wz5yY43p6EY tePizI4yQy8ZYR8Xn0cuiX3viKGB/eQNRzhgNVUSWiKLIn9bVD06TbBbfeOTgrGd69Rq 2GgwYNMdNxx44vrOCS2on2gNNBKX5Ux20DRM2tpSMkjODLTyW/3Q2JJoiBZW0vArL2u6 fr4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=b7uuIGAE6dU+80dbHoY1inA6eQKnX0OFGz2j4fCtsbA=; b=Ke1w1pZiDtGjn1gB1e58RGyxHbDvxjiVFWtry5knd0TKgA9krlS0E/0ztWrZPOZCFN fddLZPvvv2+FNND82yw4NjVvS00iWnCwyv3ATnTFTFto3VdHTqjq3GVmPoNHZ+oJU4OD aGZYG/ir8Kn6B6htqCH2lS50SgibKTqOVaLDdfbo+XAwlt31fB/0AJ36EsKZnLvO2Ul/ kcnpxXYVmnCZBs2jqirXv8z3x8y5MK98Ji9XB/NzZdERH5FlOUBbsSAXJmn3gdp12KFY ry1ZK5h9qigUeD8m5LyqlY2SSdqdXuxQ9IYTNzYvB5+ZT/OMGsWNehq5hGTDNLRwo+43 gvZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=Oy74Ittz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id hd13si11894149ejc.777.2021.11.02.12.49.42; Tue, 02 Nov 2021 12:50:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@cmpxchg-org.20210112.gappssmtp.com header.s=20210112 header.b=Oy74Ittz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230441AbhKBTuL (ORCPT + 99 others); Tue, 2 Nov 2021 15:50:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59994 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230348AbhKBTuL (ORCPT ); Tue, 2 Nov 2021 15:50:11 -0400 Received: from mail-qt1-x835.google.com (mail-qt1-x835.google.com [IPv6:2607:f8b0:4864:20::835]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C7757C061714 for ; Tue, 2 Nov 2021 12:47:35 -0700 (PDT) Received: by mail-qt1-x835.google.com with SMTP id d21so260996qtw.11 for ; Tue, 02 Nov 2021 12:47:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=b7uuIGAE6dU+80dbHoY1inA6eQKnX0OFGz2j4fCtsbA=; b=Oy74Ittz5VkqIbU9sWofCKN2a+vdqqnHjjID3frgpqAzufxGrkilvz7TrOlRVnqJvE bg15rLe36m3FqUPPFho9tTPIXZ+8QMqfWafew5ja076XFL9Uf4EKmVQxE6SFZEDxrnl6 Z3xRyUolp1FQ7Ks+sfI419B47bWAji/CxLtNEh8Ll/nwu5SX/0/hmX+n8bCHiCD9/+UM mkZvEINz2RnYNqbGah/pOU08AnGq3J9rRqwxi3bsrQKpIQ6szxFUgtr2dd3ZWl6VLHRK XCAp8a+yiHQtEt5ryXiB1YooqJRGLdm6bhyCP/ipIERh4xMgh+3NFacy7hy5nzHlEGxR iPxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=b7uuIGAE6dU+80dbHoY1inA6eQKnX0OFGz2j4fCtsbA=; b=Pq71PIVx4VT9bFlkQfg+E4LNUOpoKa7oG5eMVU5+Cw3WzEGc8pPw4AuwwWuBgocv0I 9Yehn4kc3VdoSXfj148mUvBzftwNVNb6y0Y4XIOm+x5zCzKchXz6g+3NguEHOAc2YjJ9 GUzalUgA/eeqQtGdKPwvMxvdDJ9yXV5tegANdCLavfnQwN+ydBXG//heusYMUgWWABt7 Ep3PII5XcO4V4VWVYx7RYeVjlv0mGQURzaaPidEda/6nDewGms1N3ziaiedRXkVECYCl Ju8yU8oyzL2uolhkqInGmGze85oBBUKaEJJV6/qyUPo2WfRUUp5bUw6BqJ2e/KWmlJA4 rnzA== X-Gm-Message-State: AOAM532h9tENYnyfspjoxugGLjJzXMWVSeLhajlmIMmoQLWY6FnPDEIV dDN/tAChmmH5lYUMNgsTy6jg3A== X-Received: by 2002:a05:622a:2ce:: with SMTP id a14mr14731004qtx.72.1635882454952; Tue, 02 Nov 2021 12:47:34 -0700 (PDT) Received: from localhost (cpe-98-15-154-102.hvc.res.rr.com. [98.15.154.102]) by smtp.gmail.com with ESMTPSA id v15sm13843872qkl.91.2021.11.02.12.47.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 02 Nov 2021 12:47:34 -0700 (PDT) Date: Tue, 2 Nov 2021 15:47:33 -0400 From: Johannes Weiner To: Huangzhaoyang Cc: Andrew Morton , Michal Hocko , Vladimir Davydov , Zhaoyang Huang , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Peter Zijlstra Subject: Re: [Resend PATCH] psi : calc cfs task memstall time more precisely Message-ID: References: <1634278612-17055-1-git-send-email-huangzhaoyang@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1634278612-17055-1-git-send-email-huangzhaoyang@gmail.com> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CC peterz as well for rt and timekeeping magic On Fri, Oct 15, 2021 at 02:16:52PM +0800, Huangzhaoyang wrote: > From: Zhaoyang Huang > > In an EAS enabled system, there are two scenarios discordant to current design, > > 1. workload used to be heavy uneven among cores for sake of scheduler policy. > RT task usually preempts CFS task in little core. > 2. CFS task's memstall time is counted as simple as exit - entry so far, which > ignore the preempted time by RT, DL and Irqs. > > With these two constraints, the percpu nonidle time would be mainly consumed by > none CFS tasks and couldn't be averaged. Eliminating them by calc the time growth > via the proportion of cfs_rq's utilization on the whole rq. > > eg. > Here is the scenario which this commit want to fix, that is the rt and irq consume > some utilization of the whole rq. This scenario could be typical in a core > which is assigned to deal with all irqs. Furthermore, the rt task used to run on > little core under EAS. > > Binder:305_3-314 [002] d..1 257.880195: psi_memtime_fixup: original:30616,adjusted:25951,se:89,cfs:353,rt:139,dl:0,irq:18 > droid.phone-1525 [001] d..1 265.145492: psi_memtime_fixup: original:61616,adjusted:53492,se:55,cfs:225,rt:121,dl:0,irq:15 > > Signed-off-by: Zhaoyang Huang > --- > kernel/sched/psi.c | 20 +++++++++++++++++++- > 1 file changed, 19 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c > index cc25a3c..754a836 100644 > --- a/kernel/sched/psi.c > +++ b/kernel/sched/psi.c > @@ -182,6 +182,8 @@ struct psi_group psi_system = { > > static void psi_avgs_work(struct work_struct *work); > > +static unsigned long psi_memtime_fixup(u32 growth); > + > static void group_init(struct psi_group *group) > { > int cpu; > @@ -492,6 +494,21 @@ static u64 window_update(struct psi_window *win, u64 now, u64 value) > return growth; > } > > +static unsigned long psi_memtime_fixup(u32 growth) > +{ > + struct rq *rq = task_rq(current); > + unsigned long growth_fixed = (unsigned long)growth; > + > + if (!(current->policy == SCHED_NORMAL || current->policy == SCHED_BATCH)) > + return growth_fixed; > + > + if (current->in_memstall) > + growth_fixed = div64_ul((1024 - rq->avg_rt.util_avg - rq->avg_dl.util_avg > + - rq->avg_irq.util_avg + 1) * growth, 1024); > + > + return growth_fixed; > +} > + > static void init_triggers(struct psi_group *group, u64 now) > { > struct psi_trigger *t; > @@ -658,6 +675,7 @@ static void record_times(struct psi_group_cpu *groupc, u64 now) > } > > if (groupc->state_mask & (1 << PSI_MEM_SOME)) { > + delta = psi_memtime_fixup(delta); Ok, so we want to deduct IRQ and RT preemption time from the memstall period of an active reclaimer, since it's technically not stalled on memory during this time but on CPU. However, we do NOT want to deduct IRQ and RT time from memstalls that are sleeping on refaults swapins, since they are not affected by what is going on on the CPU. Does util_avg capture that difference? I'm not confident it does - but correct me if I'm wrong. We need length of time during which and IRQ or an RT task preempted the old rq->curr, not absolute irq/rt length. (Btw, such preemption periods, in addition to being deducted from memory stalls, should probably also be added to CPU contention stalls, to make CPU pressure reporting more accurate as well.)