Received: by 2002:a05:6a10:8395:0:0:0:0 with SMTP id n21csp256143pxh; Wed, 10 Nov 2021 01:07:13 -0800 (PST) X-Google-Smtp-Source: ABdhPJxorSAy6YjXvyWwHQklI4xZ/ScxROTCmzvZVJfN7a4LlwyJDnYT5fztVMzRZMoE2PFzGGdF X-Received: by 2002:a02:954d:: with SMTP id y71mr10543451jah.83.1636535233493; Wed, 10 Nov 2021 01:07:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1636535233; cv=none; d=google.com; s=arc-20160816; b=zYfbO/mFhf4nz3aQQTYt16x3oDUjzPU+Pgkqvpols783ZDFb+7wO/b9TH39nCEgt8y XwsMo9uRf2GpuGTOm1HCaTTj7sxdBp6HDV0jNa3goa1KAy7p9L91dyYzuog7Y2FFhIeV YslF2cfS/SOAPOeVkyo4OEaa7ppvpCxVqrUwowarILzmxl/0F2rKy1c8lUTlBosPSr2j 7t+jZqwGe+foFw01IXSdOEa9MuKQ1m5LKWMDGChRbnys3l3H/v/zOYFRH67dHSGv5c0u N3ch2j9R8KtEGTW0JXAoD4ZU7VfrQLeCtztZu0sK2PfjHwsEaN5pmGK//h/r01lte/8+ +TSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=OfmFNwr+aVtdehKioEcpUngJUmRgTzAF3GcrUhygPBc=; b=RdklZ8jtNos9ml6vTPIirAmmGn0DqD7NBxCizRU2oaCpQ7l1S9IBnBwBjJNuBJzzig 2e3yQ7PD+wq4MM12p8YfXXQPQaFk0sbvX2egQP6wZfx2OwAyuziWj7X+ECmAjIOi1ERd jqG7cOAvuYC657xe60JFjnO6b4h5C2to+0JugMbDtOSJeAMH1jf/iV6FsL0kc3HcLdtM XdoOeZfxrDUSwK6WioTp5PW6QX4N9DeB3YrKtD4987n38uYjd275B8ntQm/4Jsv/9f6X o7oOhMdZj44ZJN/KQpBjWPUOsvaM6YgXREBm6HeYBplIEExu/PrVOfDTSxLdeIVu7fRp bVZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="lPHB8F/J"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id j11si46981458jat.2.2021.11.10.01.07.01; Wed, 10 Nov 2021 01:07:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b="lPHB8F/J"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229931AbhKJJHY (ORCPT + 99 others); Wed, 10 Nov 2021 04:07:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52118 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229653AbhKJJHX (ORCPT ); Wed, 10 Nov 2021 04:07:23 -0500 Received: from mail-qk1-x734.google.com (mail-qk1-x734.google.com [IPv6:2607:f8b0:4864:20::734]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38CB8C061764 for ; Wed, 10 Nov 2021 01:04:36 -0800 (PST) Received: by mail-qk1-x734.google.com with SMTP id bk22so1843408qkb.6 for ; Wed, 10 Nov 2021 01:04:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OfmFNwr+aVtdehKioEcpUngJUmRgTzAF3GcrUhygPBc=; b=lPHB8F/JVchkS0M1d9X4iOqu8LGGmPWqijuWSfWDuJVgfua/d7PF2mB4BU1cswgZh3 VeU17KVNzCHED+9ja+0WGteSoOXimLD2x9u74BoSY4hT6+K1H6vz74u6isrzpObHLiya ljud+uOnmHrtgXTq/ZcCFpOOEENRpr90e3d8aTu7ZgKIUa/JfIvZGXHxXQwBoxqYbJZn AycQ1IWyIP+PADqEOn/bkEumhtca2RRCZwDsB0V8oqfjDC9BjaUBNNzqdOIXMlr9AVnY kgHrm9l4EPXKQkIYoA6vfaW2ljuGcQBCrmaDcyrnMBYopHOsy82TH5DJyCYSfF+NenLL GwEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OfmFNwr+aVtdehKioEcpUngJUmRgTzAF3GcrUhygPBc=; b=FuL6JOIhepNfojVTibttbWxfn79NBizlvKOfh/VN5FnVxkHnSm+Tojd/FdpANcOTUz eEAgy/pGmEcN9Nz2rr/VNraQ8Mg8NyCAYfWAsw/ArMF9/3Z+Ha/QjPrUzSi5R1QuaC76 7vaRvzrOdFlg6Vpj/RgMXPMDNKS19mBOHqqwXMihDa7/G+CUun6MFQy61pcGbj02XWYI HZaId8vrcpKmBxe75hf7zbHe2q6R0lfFHaYXDiXjsEWRg+ZKx8t7hcwLZ8H/80n3ncUJ kATkSKetd+K9BAZOaxp+ecjCXxow2DFv251qlyBFWaHIDHtzeTN2zrzbaAnMZ9kNqteH iecQ== X-Gm-Message-State: AOAM5305CjeAt8UsHWKWlHR7xQPI1zVvTXTc+RE5HZ0o1IYl8Nt+eaEB hu8ibcItON+iJkCsYJ87vFDxrzgfSjcNJX05iNU= X-Received: by 2002:a37:4041:: with SMTP id n62mr10973112qka.225.1636535075083; Wed, 10 Nov 2021 01:04:35 -0800 (PST) MIME-Version: 1.0 References: <1634278612-17055-1-git-send-email-huangzhaoyang@gmail.com> In-Reply-To: From: Zhaoyang Huang Date: Wed, 10 Nov 2021 17:04:14 +0800 Message-ID: Subject: Re: [Resend PATCH] psi : calc cfs task memstall time more precisely To: Vincent Guittot Cc: Peter Zijlstra , Johannes Weiner , Andrew Morton , Michal Hocko , Vladimir Davydov , Zhaoyang Huang , "open list:MEMORY MANAGEMENT" , LKML Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Nov 10, 2021 at 4:49 PM Vincent Guittot wrote: > > On Tue, 9 Nov 2021 at 15:56, Peter Zijlstra wrote: > > > > On Tue, Nov 02, 2021 at 03:47:33PM -0400, Johannes Weiner wrote: > > > CC peterz as well for rt and timekeeping magic > > > > > > On Fri, Oct 15, 2021 at 02:16:52PM +0800, Huangzhaoyang wrote: > > > > From: Zhaoyang Huang > > > > > > > > In an EAS enabled system, there are two scenarios discordant to current design, > > > > > > > > 1. workload used to be heavy uneven among cores for sake of scheduler policy. > > > > RT task usually preempts CFS task in little core. > > > > 2. CFS task's memstall time is counted as simple as exit - entry so far, which > > > > ignore the preempted time by RT, DL and Irqs. > > > > It ignores preemption full-stop. I don't see why RT/IRQ should be > > special cased here. > > > > > > With these two constraints, the percpu nonidle time would be mainly consumed by > > > > none CFS tasks and couldn't be averaged. Eliminating them by calc the time growth > > > > via the proportion of cfs_rq's utilization on the whole rq. > > > > > > > > +static unsigned long psi_memtime_fixup(u32 growth) > > > > +{ > > > > + struct rq *rq = task_rq(current); > > > > + unsigned long growth_fixed = (unsigned long)growth; > > > > + > > > > + if (!(current->policy == SCHED_NORMAL || current->policy == SCHED_BATCH)) > > > > + return growth_fixed; > > > > + > > > > + if (current->in_memstall) > > > > + growth_fixed = div64_ul((1024 - rq->avg_rt.util_avg - rq->avg_dl.util_avg > > > > + - rq->avg_irq.util_avg + 1) * growth, 1024); > > > > + > > > > + return growth_fixed; > > > > +} > > > > + > > > > static void init_triggers(struct psi_group *group, u64 now) > > > > { > > > > struct psi_trigger *t; > > > > @@ -658,6 +675,7 @@ static void record_times(struct psi_group_cpu *groupc, u64 now) > > > > } > > > > > > > > if (groupc->state_mask & (1 << PSI_MEM_SOME)) { > > > > + delta = psi_memtime_fixup(delta); > > > > > > Ok, so we want to deduct IRQ and RT preemption time from the memstall > > > period of an active reclaimer, since it's technically not stalled on > > > memory during this time but on CPU. > > > > > > However, we do NOT want to deduct IRQ and RT time from memstalls that > > > are sleeping on refaults swapins, since they are not affected by what > > > is going on on the CPU. > > > > I think that focus on RT/IRQ is mis-guided here, and the implementation > > is horrendous. > > > > So the fundamental question seems to be; and I think Johannes is the one > > to answer that: What time-base do these metrics want to use? > > > > Do some of these states want to account in task-time instead of > > wall-time perhaps? I can't quite remember, but vague memories are > > telling me most of the PSI accounting was about blocked tasks, not > > running tasks, which makes all this rather more complicated. > > I tend to agree with this. > Using rq_clock_task(rq) instead of cpu_clock(cpu) will remove the time > spent under interrupt as an example > and AFAICT, rq->clock_task is updated before calling psi function thanks vincent. Could rq_clock_task help on removing the preempted time of CFS task by RT/DL, which is the mainly part we want to solve on memstall time. > > > > > Randomly scaling time as proposed seems almost certainly wrong. What > > would that make the stats mean?