Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754412AbdDMQB6 (ORCPT ); Thu, 13 Apr 2017 12:01:58 -0400 Received: from gum.cmpxchg.org ([85.214.110.215]:45324 "EHLO gum.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754281AbdDMQBz (ORCPT ); Thu, 13 Apr 2017 12:01:55 -0400 Date: Thu, 13 Apr 2017 12:01:47 -0400 From: Johannes Weiner To: Minchan Kim Cc: Tim Murray , Michal Hocko , Vladimir Davydov , LKML , cgroups@vger.kernel.org, Linux-MM , Suren Baghdasaryan , Patrik Torstensson , Android Kernel Team Subject: Re: [RFC 0/1] add support for reclaiming priorities per mem cgroup Message-ID: <20170413160147.GB29727@cmpxchg.org> References: <20170317231636.142311-1-timmurray@google.com> <20170330155123.GA3929@cmpxchg.org> <20170413043047.GA16783@bbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170413043047.GA16783@bbox> User-Agent: Mutt/1.8.0 (2017-02-23) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3002 Lines: 69 On Thu, Apr 13, 2017 at 01:30:47PM +0900, Minchan Kim wrote: > On Thu, Mar 30, 2017 at 12:40:32PM -0700, Tim Murray wrote: > > As a result, I think there's still a need for relative priority > > between mem cgroups, not just an absolute limit. > > > > Does that make sense? > > I agree with it. > > Recently, embedded platform's workload for smart things would be much > diverse(from game to alarm) so it's hard to handle the absolute limit > proactively and userspace has more hints about what workloads are > more important(ie, greedy) compared to others although it would be > harmful for something(e.g., it's not visible effect to user) > > As a such point of view, I support this idea as basic approach. > And with thrashing detector from Johannes, we can do fine-tune of > LRU balancing and vmpressure shooting time better. > > Johannes, > > Do you have any concern about this memcg prority idea? While I fully agree that relative priority levels would be easier to configure, this patch doesn't really do that. It allows you to set a scan window divider to a fixed amount and, as I already pointed out, the scan window is no longer representative of memory pressure. [ Really, sc->priority should probably just be called LRU lookahead factor or something, there is not much about it being representative of any kind of urgency anymore. ] With this patch, if you configure the priorities of two 8G groups to 0 and 4, reclaim will treat them exactly the same*. If you configure the priorities of two 100G groups to 0 and 7, reclaim will treat them exactly the same. The bigger the group, the more of the lower range of the priority range becomes meaningless, because once the divider produces outcomes bigger than SWAP_CLUSTER_MAX(32), it doesn't actually bias reclaim anymore. So that's not a portable relative scale of pressure discrimination. But the bigger problem with this is that, as sc->priority doesn't represent memory pressure anymore, it is merely a cut-off for which groups to scan and which groups not to scan *based on their size*. That is the same as setting memory.low! * For simplicity, I'm glossing over the fact here that LRUs are split by type and into inactive/active, so in reality the numbers are a little different, but you get the point. > Or > Do you think the patchset you are preparing solve this situation? It's certainly a requirement. In order to implement a relative scale of memory pressure discrimination, we first need to be able to really quantify memory pressure. Then we can either allow setting absolute latency/slowdown minimums for each group, with reclaim skipping groups above those thresholds, or we can map a relative priority scale against the total slowdown due to lack of memory in the system, and each group gets a relative share based on its priority compared to other groups. But there is no way around first having a working measure of memory pressure before we can meaningfully distribute it among the groups. Thanks