Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp1625141ybv; Sat, 8 Feb 2020 02:22:09 -0800 (PST) X-Google-Smtp-Source: APXvYqyQoi7l8qvim72R2HhCn/y34Fa/SLvw30v8YcmTvoQ5Ty8kNUW3pe2hTGU+qGLFs2zMdPhD X-Received: by 2002:a05:6808:181:: with SMTP id w1mr5118141oic.176.1581157329645; Sat, 08 Feb 2020 02:22:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581157329; cv=none; d=google.com; s=arc-20160816; b=LHhpV8HC854mMb3C5Ds+tLPPwOYbq6HT2jahwk3gMJZmHxfNmzRoCeKcqIxRmQKI/N iTwT8g8+LliZwTglgAXs6mdTdvnMH8wG2vrs2cjX0vMH5pZx/UGxlmpWc8EUi5p/XSOU 2pqUOQzcKma9HK5IlMpN3HBH31V+GMy6NQ1pOf82CSzkqeelGS9ngPhpcJ07SsB8eRfD 8vwzRY05pIjhhfrBUmSgBBNhN4b6T2mVqyStOBhbo9JBUYncvpbn+1BNBZNmKTciqSbg Pn/x6R9atqNto+K/wDw8dTfbxKJCkpXGmhhdWhZcmzG6K17i9LRZu//BHqyXFPF9aKLd yW9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=WF5jCu9nyuBI+0Vre+LHe4YP9D2Tsa5c5fqD1Y+shy4=; b=0lj7iDspxzvszqw0BJOo1AWzqm2/omOfhAlS58YZhsPPW1vJ5nJe89DfcZLppYQ0KY 25FLarv2GvYHt+BBKBXBGEAWqFnei41UO2KHn6qwFhwFQjzNAb7kxzSoZ/5U49kwB5/k zlchSIEnlsJkApkp4MITLcOF0g3QBT8PT3IgYXykUEqmph5HJFQlC0Ehu6W9AOYQwR0y NLwm2IPO+k78SkJv05tGjJtXppZAV6Lfx3Ins2BVgSkcaCPakViaRQnAW/wj0SBmaoAP hx/bSlXYxgt9s0fphm+uWkCRdzvpXzSP5qVZ+n8TcqvQVbHZ1y1l3dR/UNi7/dTgk15J tF3g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b="f/7m2ofd"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id y8si5030061oih.141.2020.02.08.02.21.58; Sat, 08 Feb 2020 02:22:09 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=merlin.20170209 header.b="f/7m2ofd"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727144AbgBHKUe (ORCPT + 99 others); Sat, 8 Feb 2020 05:20:34 -0500 Received: from merlin.infradead.org ([205.233.59.134]:52894 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726876AbgBHKUe (ORCPT ); Sat, 8 Feb 2020 05:20:34 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=merlin.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=WF5jCu9nyuBI+0Vre+LHe4YP9D2Tsa5c5fqD1Y+shy4=; b=f/7m2ofdHzR3uczftn7k/gCaYP /uAD4+xs/QVCw5p94EaSw0QHxje8hNruejcAserp8U/20pPpubaBNk97qDYOSBnm6r69sLhcfeHl6 NXK4EvRw//zYVM/3drA4Tlv13neeg//ob16XWy/GnvkPi4Jy/OObIaTqPmz2l8VSkhXcy02NRjwzj coM733yniMeHCtLGbpDw/2cEOvH2s6Qxa8Qr4CMYAZES2cZWtzDU6ORQgPqlAYmDyrNzOWsS4nPrL /WFYuE00/OVOeoU+n/BjqN8tMuUQSxZyum4NSd3kcciG5cePkrfswgV294e1PssGMRTMwuKokMF4z ygqafJQQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by merlin.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1j0NDV-0000xg-GA; Sat, 08 Feb 2020 10:20:01 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id AEF87300596; Sat, 8 Feb 2020 11:18:10 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id D25302B88CC00; Sat, 8 Feb 2020 11:19:57 +0100 (CET) Date: Sat, 8 Feb 2020 11:19:57 +0100 From: Peter Zijlstra To: Johannes Weiner Cc: Ivan Babrou , linux-kernel , kernel-team , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman Subject: Re: Lower than expected CPU pressure in PSI Message-ID: <20200208101957.GU14946@hirez.programming.kicks-ass.net> References: <20200109161632.GB8547@cmpxchg.org> <20200207130829.GG14897@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200207130829.GG14897@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 07, 2020 at 02:08:29PM +0100, Peter Zijlstra wrote: > On Thu, Jan 09, 2020 at 11:16:32AM -0500, Johannes Weiner wrote: > > On Wed, Jan 08, 2020 at 11:47:10AM -0800, Ivan Babrou wrote: > > > We added reporting for PSI in cgroups and results are somewhat surprising. > > > > > > My test setup consists of 3 services: > > > > > > * stress-cpu1-no-contention.service : taskset -c 1 stress --cpu 1 > > > * stress-cpu2-first-half.service : taskset -c 2 stress --cpu 1 > > > * stress-cpu2-second-half.service : taskset -c 2 stress --cpu 1 > > > > > > First service runs unconstrained, the other two compete for CPU. > > > > > > As expected, I can see 500ms/s sched delay for the latter two and > > > aggregated 1000ms/s delay for /system.slice, no surprises here. > > > > > > However, CPU pressure reported by PSI says that none of my services > > > have any pressure on them. I can see around 434ms/s pressure on > > > /unified/system.slice and 425ms/s pressure on /unified cgroup, which > > > is surprising for three reasons: > > > > > > * Pressure is absent for my services (I expect it to match scheed delay) > > > * Pressure on /unified/system.slice is lower than both 500ms/s and 1000ms/s > > > * Pressure on root cgroup is lower than on system.slice > > > > CPU pressure is currently implemented based only on the number of > > *runnable* tasks, not on who gets to actively use the CPU. This works > > for contention within cgroups or at the global scope, but it doesn't > > correctly reflect competition between cgroups. It also doesn't show > > the effects of e.g. cpu cycle limiting through cpu.max where there > > might *be* only one runnable task, but it's not getting the CPU. > > > > I've been working on fixing this, but hadn't gotten around to sending > > the patch upstream. Attaching it below. Would you mind testing it? > > > > Peter, what would you think of the below? > > I'm not loving it; but I see what it does and I can't quickly see an > alternative. > > My main gripe is doing even more of those cgroup traversals. > > One thing pick_next_task_fair() does is try and limit the cgroup > traversal to the sub-tree that contains both prev and next. Not sure > that is immediately applicable here, but it might be worth looking into. One option I suppose, would be to replace this: +static inline void psi_sched_switch(struct task_struct *prev, + struct task_struct *next, + bool sleep) +{ + if (static_branch_likely(&psi_disabled)) + return; + + /* + * Clear the TSK_ONCPU state if the task was preempted. If + * it's a voluntary sleep, dequeue will have taken care of it. + */ + if (!sleep) + psi_task_change(prev, TSK_ONCPU, 0); + + psi_task_change(next, 0, TSK_ONCPU); +} With something like: static inline void psi_sched_switch(struct task_struct *prev, struct task_struct *next, bool sleep) { struct psi_group *g, *p = NULL; set = TSK_ONCPU; clear = 0; while ((g = iterate_group(next, &g))) { u32 nr_running = per_cpu_ptr(g->pcpu, cpu)->tasks[NR_RUNNING]; if (nr_running) { /* if set, we hit the subtree @prev lives in, terminate */ p = g; break; } /* the rest of psi_task_change */ } if (sleep) return; set = 0; clear = TSK_ONCPU; while ((g = iterate_group(prev, &g))) { if (g == p) break; /* the rest of psi_task_change */ } } That way we avoid clearing and setting the common parents.