Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp594968ybv; Fri, 7 Feb 2020 05:10:12 -0800 (PST) X-Google-Smtp-Source: APXvYqxSCnFPeox5KKssz2HlP25hn/up0Xw0z2DY2ue045z70ulPus2TQV/6BX5UlRR/kplRqtxk X-Received: by 2002:a54:440e:: with SMTP id k14mr1956567oiw.160.1581081012759; Fri, 07 Feb 2020 05:10:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581081012; cv=none; d=google.com; s=arc-20160816; b=cQZr6RvkwQ5EShdlSCYDkIJ75Mpq2wuZGLPeS2mrIBCouFYdLRBWbAvscIPFHGu+9i MTqtl9DhKHSPd2SWRxtEjD2NMPRvfvXgPZto0ZCQyV6UxuE9R8dWoWSy/Dajhy23zYDx eGPq32RWGA2Z/DjpEj4pZnz8JR5Q6JncCKs0R0nKw8D8XPg+EE+zwYaPiQATKNLF/2eS 1EexoNdCV1Sf2I7tNaomNdE3bqhGzpe4YXwSItggjBJLkaCBOL4NwixRELXEctNKSt4d SdmUZaEsNZYldGcw9lOhyT5AckJ1p98ii2AcR06Vhbp3ee0Fue8BY7760n2266steuqh 08Sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=ifUoH7dK4vhbVULGfnT7B07fxu23LKdoHciUnJ/81OM=; b=hz3ziponoRW55FJPWCcNlZkCxDumOOvJP2q43FoDDd1PGL7jtiZbGIMVKTf+M5mbKM oR1DUOMBBsEq3WQMPyvCgqAlT6AUm4QMraUDwby5Ymyx2XHwT7EfapdKc2UDs33WaPla waif1l53ZojWgrO+rPj8JWvAgh9FOASXgm4rvkPOXxfqqqVKE9qGgAZEFffzBVVaCt/2 /y+Ug7lNK62ai/B6/+H6yUiqVqrMeT24SQC5g7zWeVHoxXKYwkhD9UOPc6/FnsvzUYvJ pz3zY9wh3Ogukak913gsGm+vE2O15/Zz/9/A82pOPCT+6PuXHio9x5MxMLMJyPBVDkbe MH6Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=tWIyT3JC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id n203si3579935oia.112.2020.02.07.05.10.00; Fri, 07 Feb 2020 05:10:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=tWIyT3JC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727117AbgBGNIo (ORCPT + 99 others); Fri, 7 Feb 2020 08:08:44 -0500 Received: from bombadil.infradead.org ([198.137.202.133]:33942 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726974AbgBGNIo (ORCPT ); Fri, 7 Feb 2020 08:08:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=In-Reply-To:Content-Type:MIME-Version :References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ifUoH7dK4vhbVULGfnT7B07fxu23LKdoHciUnJ/81OM=; b=tWIyT3JCCa9A/h8MkBFEaYMrO0 jeiqV5T2zOf87Xjy4kfoHHjmynYmyoCQBYHHDovtHspoD+VX0dA2HCkm1yq+CH0AClp1GXSHuHNO+ UdwIldwY+bo9bC9NTUwIIRGy+0MT34Sn9xOP4FItWna91w5/QURoOQwNmaJPriXmG5lkz0Zg9BmV2 Evb8nMvqlIrHmsjOC7YkgETvbSrd8HT7Ul2qQM6JONNddQ44iosjxHLRjRQ70lYfpEKCMau8dPqL8 r4TVbDdjuLIh/dDbOX6xNI7WeObxJDuxj263rgQtBbKA9+V5w3cYEXeDR7UWcvgr93kCd7lsq35WG BBToEz4A==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by bombadil.infradead.org with esmtpsa (Exim 4.92.3 #3 (Red Hat Linux)) id 1j03N2-000353-JL; Fri, 07 Feb 2020 13:08:32 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 223F5305EEC; Fri, 7 Feb 2020 14:06:42 +0100 (CET) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id C950A2032662D; Fri, 7 Feb 2020 14:08:29 +0100 (CET) Date: Fri, 7 Feb 2020 14:08:29 +0100 From: Peter Zijlstra To: Johannes Weiner Cc: Ivan Babrou , linux-kernel , kernel-team , Ingo Molnar , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman Subject: Re: Lower than expected CPU pressure in PSI Message-ID: <20200207130829.GG14897@hirez.programming.kicks-ass.net> References: <20200109161632.GB8547@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200109161632.GB8547@cmpxchg.org> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 09, 2020 at 11:16:32AM -0500, Johannes Weiner wrote: > On Wed, Jan 08, 2020 at 11:47:10AM -0800, Ivan Babrou wrote: > > We added reporting for PSI in cgroups and results are somewhat surprising. > > > > My test setup consists of 3 services: > > > > * stress-cpu1-no-contention.service : taskset -c 1 stress --cpu 1 > > * stress-cpu2-first-half.service : taskset -c 2 stress --cpu 1 > > * stress-cpu2-second-half.service : taskset -c 2 stress --cpu 1 > > > > First service runs unconstrained, the other two compete for CPU. > > > > As expected, I can see 500ms/s sched delay for the latter two and > > aggregated 1000ms/s delay for /system.slice, no surprises here. > > > > However, CPU pressure reported by PSI says that none of my services > > have any pressure on them. I can see around 434ms/s pressure on > > /unified/system.slice and 425ms/s pressure on /unified cgroup, which > > is surprising for three reasons: > > > > * Pressure is absent for my services (I expect it to match scheed delay) > > * Pressure on /unified/system.slice is lower than both 500ms/s and 1000ms/s > > * Pressure on root cgroup is lower than on system.slice > > CPU pressure is currently implemented based only on the number of > *runnable* tasks, not on who gets to actively use the CPU. This works > for contention within cgroups or at the global scope, but it doesn't > correctly reflect competition between cgroups. It also doesn't show > the effects of e.g. cpu cycle limiting through cpu.max where there > might *be* only one runnable task, but it's not getting the CPU. > > I've been working on fixing this, but hadn't gotten around to sending > the patch upstream. Attaching it below. Would you mind testing it? > > Peter, what would you think of the below? I'm not loving it; but I see what it does and I can't quickly see an alternative. My main gripe is doing even more of those cgroup traversals. One thing pick_next_task_fair() does is try and limit the cgroup traversal to the sub-tree that contains both prev and next. Not sure that is immediately applicable here, but it might be worth looking into.