Received: by 10.192.165.148 with SMTP id m20csp1247519imm; Thu, 10 May 2018 07:51:04 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpFemz8keX3uYIhoikkaEeSqUlJmRTRf5b2kuWbo5c+Kbcwn4fr6O2pps58p9YCxPoz9wYL X-Received: by 2002:a17:902:bc88:: with SMTP id bb8-v6mr1726255plb.175.1525963863959; Thu, 10 May 2018 07:51:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525963863; cv=none; d=google.com; s=arc-20160816; b=kcyiFPjdbBHK+GwLzSUxTg1dZ1OOL0pJE7CIw4UnrHbMuc3jLP/8Tgsu9OuKvly01o BtbcuBw0Q/kqPYPdqfb8ffb1j+Tgs69ZRHoT4J2TY4lLBTBKO4kYR1VUpzGfrtA32S0p AmDWBvb359c8SqhupXECleH9sNy4HhRVhRKGdOjF3hBmPSppE34W6Ig20GQAskSev1J0 69jh2MTTUlBu58LrrnTHgItXSjikEYTjFhbkn6OVw8wS6+tEN/sBx5qDbhgumYi/pCmX NWnuwpqDhML82Tqj1ZwrvQBKphfnr6AnulPAM4DOsOzf36PObV7uPB/zxaFRxvZY+27X o+rQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature:arc-authentication-results; bh=VvhdY2Icrvortic3LDJgG6o4jdtiMx2l0jXQ2jnb1TU=; b=ObUZESdGoQ0N8fjDomJNhNDHqxgnNSG6VNmwUdy9PvyIGLtt+v9ORBHde14yGds/LH alPxM5GFZDIfoIYSWhKfwQsYtxnBME28Mlwk/ZzfwX0wjaU+ZjduimuJlsgx05045qWP DO475O8IwKjLvuDca9DOVSBleD/0RbBcy65l7M2fQQGywhwRfbDci5MSLLEk54X8cwjd D04H+60A4fPx2D5Melaheb+sm4DRBbAavk4hjqlYc8SxK5Dd5sh6MtciNpngMiS/bO+d Dxbodn8LgwSeqfn7KnbXiLv9WCbxvZfoEa5X3SfoTfJeLI3xIhHtZ/X53yoSToDKzHc3 R72Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=TspPyEo4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b7-v6si777143pgq.585.2018.05.10.07.50.49; Thu, 10 May 2018 07:51:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@cmpxchg.org header.s=x header.b=TspPyEo4; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=cmpxchg.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966142AbeEJOsA (ORCPT + 99 others); Thu, 10 May 2018 10:48:00 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:50268 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966102AbeEJOr4 (ORCPT ); Thu, 10 May 2018 10:47:56 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=cmpxchg.org ; s=x; h=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject: Cc:To:From:Date:Sender:Reply-To:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=VvhdY2Icrvortic3LDJgG6o4jdtiMx2l0jXQ2jnb1TU=; b=TspPyEo4YH3A8Z4dDoTbu5SEHV jJr//qieGvFKS1j525guo+Re7tqYVV5PfB/BWKRVaz1rYgZzYEycVj7X1Kb6j/Fb4NLoA8e9JSU3j nT6vI0QFto8+1LSwSTZG0rdCzFSjFrU3yRMilMPKl/sl+Dbxttf/gqMSyjXYQZ5VPKPo=; Date: Thu, 10 May 2018 10:49:43 -0400 From: Johannes Weiner To: Peter Zijlstra Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-block@vger.kernel.org, cgroups@vger.kernel.org, Ingo Molnar , Andrew Morton , Tejun Heo , Balbir Singh , Mike Galbraith , Oliver Yang , Shakeel Butt , xxx xxx , Taras Kondratiuk , Daniel Walker , Vinayak Menon , Ruslan Ruslichenko , kernel-team@fb.com Subject: Re: [PATCH 7/7] psi: cgroup support Message-ID: <20180510144943.GH19348@cmpxchg.org> References: <20180507210135.1823-1-hannes@cmpxchg.org> <20180507210135.1823-8-hannes@cmpxchg.org> <20180509110736.GR12217@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180509110736.GR12217@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, May 09, 2018 at 01:07:36PM +0200, Peter Zijlstra wrote: > On Mon, May 07, 2018 at 05:01:35PM -0400, Johannes Weiner wrote: > > --- a/kernel/sched/psi.c > > +++ b/kernel/sched/psi.c > > @@ -260,6 +260,18 @@ void psi_task_change(struct task_struct *task, u64 now, int clear, int set) > > task->psi_flags |= set; > > > > psi_group_update(&psi_system, cpu, now, clear, set); > > + > > +#ifdef CONFIG_CGROUPS > > + cgroup = task->cgroups->dfl_cgrp; > > + while (cgroup && (parent = cgroup_parent(cgroup))) { > > + struct psi_group *group; > > + > > + group = cgroup_psi(cgroup); > > + psi_group_update(group, cpu, now, clear, set); > > + > > + cgroup = parent; > > + } > > +#endif > > } > > TJ fixed needing that for stats at some point, why can't you do the > same? The stats deltas are all additive, so it's okay to delay flushing them up the tree right before somebody is trying to look at them. With this, though, we are tracking time of an aggregate state composed of child tasks, and that state might not be identical for you and all your ancestor, so everytime a task state changes we have to evaluate and start/stop clocks on every level, because we cannot derive our state from the state history of our child groups. For example, say you have the following tree: root / A / \ A1 A2 running=1 running=1 I.e. There is a a running task in A1 and one in A2. root, A, A1, and A2 are all PSI_NONE as nothing is stalled. Now the task in A2 enters a memstall. root / A / \ A1 A2 running=1 memstall=1 From the perspective of A2, the group is now fully blocked and starts recording time in PSI_FULL. From the perspective of A, it has a working group below it and a stalled one, which would make it PSI_SOME, so it starts recording time in PSI_SOME. The root/sytem level likewise has to start the timer on PSI_SOME. Now the task in A1 enters a memstall, and we have to propagate the PSI_FULL state up A1 -> A -> root. I'm not quite sure how we could make this lazy. Say we hadn't propagated the state from A1 and A2 right away, and somebody is asking about the averages for A. We could tell that A1 and A2 had been in PSI_FULL recently, but we wouldn't know exactly if them being in these states fully overlapped (all PSI_FULL), overlapped partially (some PSI_FULL and some PSI_SOME), or didn't overlap at all (PSI_SOME).