Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752710Ab2KRWxY (ORCPT ); Sun, 18 Nov 2012 17:53:24 -0500 Received: from mail-da0-f46.google.com ([209.85.210.46]:37685 "EHLO mail-da0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752639Ab2KRWxW (ORCPT ); Sun, 18 Nov 2012 17:53:22 -0500 Date: Sun, 18 Nov 2012 14:53:19 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Anton Vorontsov cc: Glauber Costa , "Kirill A. Shutemov" , Pekka Enberg , Mel Gorman , Leonid Moiseichuk , KOSAKI Motohiro , Minchan Kim , Bartlomiej Zolnierkiewicz , John Stultz , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org, patches@linaro.org, kernel-team@android.com, linux-man@vger.kernel.org, KAMEZAWA Hiroyuki , Michal Hocko , Johannes Weiner , Tejun Heo Subject: Re: [RFC v3 0/3] vmpressure_fd: Linux VM pressure notifications In-Reply-To: <20121117012114.GA22910@lizard.sbx05663.mountca.wayport.net> Message-ID: References: <20121115033932.GA15546@lizard.sbx05977.paloaca.wayport.net> <20121115073420.GA19036@lizard.sbx05977.paloaca.wayport.net> <20121115085224.GA4635@lizard> <50A60873.3000607@parallels.com> <50A6AC48.6080102@parallels.com> <20121117012114.GA22910@lizard.sbx05663.mountca.wayport.net> User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3468 Lines: 81 On Fri, 16 Nov 2012, Anton Vorontsov wrote: > The main change is that I decided to go with discrete levels of the > pressure. > > When I started writing the man page, I had to describe the 'reclaimer > inefficiency index', and while doing this I realized that I'm describing > how the kernel is doing the memory management, which we try to avoid in > the vmevent. And applications don't really care about these details: > reclaimers, its inefficiency indexes, scanning window sizes, priority > levels, etc. -- it's all "not interesting", and purely kernel's stuff. So > I guess Mel Gorman was right, we need some sort of levels. > > What applications (well, activity managers) are really interested in is > this: > > 1. Do we we sacrifice resources for new memory allocations (e.g. files > cache)? > 2. Does the new memory allocations' cost becomes too high, and the system > hurts because of this? > 3. Are we about to OOM soon? > > And here are the answers: > > 1. VMEVENT_PRESSURE_LOW > 2. VMEVENT_PRESSURE_MED > 3. VMEVENT_PRESSURE_OOM > > There is no "high" pressure, since I really don't see any definition of > it, but it's possible to introduce new levels without breaking ABI. > > Later I came up with the fourth level: > > Maybe it makes sense to implement something like PRESSURE_MILD/BALANCE > with an additional nr_pages threshold, which basically hits the kernel > about how many easily reclaimable pages userland has (that would be a > part of our definition for the mild/balance pressure level). > > I.e. the fourth level can serve as a two-way communication w/ the kernel. > But again, this would be just an extension, I don't want to introduce this > now. > That certainly makes sense, it would be too much of a usage and maintenance burden to assume that the implementation of the VM is to remain the same. > > The set of nodes that a thread is allowed to allocate from may face memory > > pressure up to and including oom while the rest of the system may have a > > ton of free memory. Your solution is to compile and mount memcg if you > > want notifications of memory pressure on those nodes. Others in this > > thread have already said they don't want to rely on memcg for any of this > > and, as Anton showed, this can be tied directly into the VM without any > > help from memcg as it sits today. So why implement a simple and clean > > You meant 'why not'? > Yes, sorry. > > mempressure cgroup that can be used alone or co-existing with either memcg > > or cpusets? > > > > Same thing with a separate mempressure cgroup. The point is that there > > will be users of this cgroup that do not want the overhead imposed by > > memcg (which is why it's disabled in defconfig) and there's no direct > > dependency that causes it to be a part of memcg. > > There's also an API "inconvenince issue" with memcg's usage_in_bytes > stuff: applications have a hard time resetting the threshold to 'emulate' > the pressure notifications, and they also have to count bytes (like 'total > - used = free') to set the threshold. While a separate 'pressure' > notifications shows exactly what apps actually want to know: the pressure. > Agreed. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/