Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751717AbdHACYF convert rfc822-to-8bit (ORCPT ); Mon, 31 Jul 2017 22:24:05 -0400 Received: from mout.gmx.net ([212.227.15.18]:63789 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751615AbdHACYE (ORCPT ); Mon, 31 Jul 2017 22:24:04 -0400 Message-ID: <1501554199.5269.22.camel@gmx.de> Subject: Re: [PATCH 3/3] mm/sched: memdelay: memory health interface for systems and workloads From: Mike Galbraith To: Johannes Weiner Cc: Peter Zijlstra , Ingo Molnar , Andrew Morton , Rik van Riel , Mel Gorman , linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Date: Tue, 01 Aug 2017 04:23:19 +0200 In-Reply-To: <20170731203839.GA5162@cmpxchg.org> References: <20170727153010.23347-1-hannes@cmpxchg.org> <20170727153010.23347-4-hannes@cmpxchg.org> <20170729091055.GA6524@worktop.programming.kicks-ass.net> <20170730152813.GA26672@cmpxchg.org> <20170731083111.tgjgkwge5dgt5m2e@hirez.programming.kicks-ass.net> <20170731184142.GA30943@cmpxchg.org> <1501530579.9118.43.camel@gmx.de> <20170731203839.GA5162@cmpxchg.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.20.5 Mime-Version: 1.0 Content-Transfer-Encoding: 8BIT X-Provags-ID: V03:K0:4XmDnQp21wPIzHAWKXn24GaLdRz9lr3hVRhhO37B/5GBa06vc87 Z0siEUSVcZ/lM9DtARTi37UaMp9EySrUjTcgq53CTYsFtkdc00H6c8pj23F8XCXAji2Uh5B 2X1+YcDyz0UbjIgB2ZtCl2yMXrUrulAI5BaMofsCT2Vd+IW/muVQJiCnQ/uUhWqz0dSRKQI AwDwoiAj2T2vgroWF3lvw== X-UI-Out-Filterresults: notjunk:1;V01:K0:9KH+2qmSdMQ=:2qPpEtO5XvX89M61Qa6e9C QgFAgXeSzbS0ojwmjgWapzyMUOXli6xeHDSVNT5f1hbfHnrk65tf77p6sUarbIjU13Fcw3Fxr klnTQVSRKtC50JbwuhI4+ZGzsO0UQi5UHbR5Zwoc86QNSzonPJvDa9PjU9jPUx27giHUYhcmn Wp+TLUvvMzuYdzYy9fe9t9I9b5DXGEeAPPGS+9NTVwGnKUw8jLQmg4wJm7szZc0pOOcbB0X5i DpVTjLuUF5twU990OdJgw5GafLe9Q5lCoTjM3NlDE+gSQjNYsTMv+YXpHA0Jo4juHuhlRZ8rd 2YuCAgDP04dtWQsXj5cLyQ9EBfN6Ub53rCZtGPEj+jrZ16bHd960HhtSHPQKG7fo8K/Qxb0r8 X2pznY24p/Gf590tFqINu00+rIXEOnG36kYqnW5kIk6K/7DIy3PZZvcQTXce158K/DbaMyCfL 5jsIXMmcB0QK0KCRWJsb4fd6XVnJJYKAwvVYPGseGLQAhXl0Vxfw40CTC6Dutr4+Bom7ORAFh GkScRqTw2EJrnokldsCZxeL+aqfJE6GHx6eUZKRSTaK8Djd5DRGQYfWmhm8PCbY00OORDw/QX iP4UMkWt3jVqCdj/QsUSP4e7FjLaAkpK1zffHsyzWD5vQztcow0JGTDK7Vi6XWULCZskixLRT x7vSG6f+249/ZnXx9igpM7htF72w5xLX0u1o3wS3V+ZjaBV/+NXHvOn3mbpbK9MSZljIvRk6i zArb2xko6QLSgC4Q7eOyltNMjEEtkXMRxYhAK+Uk2TlB/mQdFL3rH/MSMYLJObuTcaj2glD1E Im762yAYktXhLS8/Yk8oMoihO98tCvubfG6phcdYO9WyoDZCjU= Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2216 Lines: 46 On Mon, 2017-07-31 at 16:38 -0400, Johannes Weiner wrote: > On Mon, Jul 31, 2017 at 09:49:39PM +0200, Mike Galbraith wrote: > > On Mon, 2017-07-31 at 14:41 -0400, Johannes Weiner wrote: > > > > > > Adding an rq counter for tasks inside memdelay sections should be > > > straight-forward as well (except for maybe the migration cost of that > > > state between CPUs in ttwu that Mike pointed out). > > > > What I pointed out should be easily eliminated (zero use case). > > How so? I was thinking along the lines of schedstat_enabled(). > > > That leaves the question of how to track these numbers per cgroup at > > > an acceptable cost. The idea for a tree of cgroups is that walltime > > > impact of delays at each level is reported for all tasks at or below > > > that level. E.g. a leave group aggregates the state of its own tasks, > > > the root/system aggregates the state of all tasks in the system; hence > > > the propagation of the task state counters up the hierarchy. > > > > The crux of the biscuit is where exactly the investment return lies. > >  Gathering of these numbers ain't gonna be free, no matter how hard you > > try, and you're plugging into paths where every cycle added is made of > > userspace hide. > > Right. But how to implement it sanely and optimize for cycles, and > whether we want to default-enable this interface are two separate > conversations. > > It makes sense to me to first make the implementation as lightweight > on cycles and maintainability as possible, and then worry about the > cost / benefit defaults of the shipped Linux kernel afterwards. > > That goes for the purely informative userspace interface, anyway. The > easily-provoked thrashing livelock I have described in the email to > Andrew is a different matter. If the OOM killer requires hooking up to > this metric to fix it, it won't be optional. But the OOM code isn't > part of this series yet, so again a conversation best had later, IMO. If that "the many must pay a toll to save the few" conversation ever happens, just recall me registering my boo/hiss in advance.  I don't have to feel guilty about not liking the idea of making donations to feed the poor starving proggies ;-) -Mike