Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp5917332yba; Thu, 11 Apr 2019 08:21:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqziibF3t4XAs5dIXorXWnFGQfC6gTc1jfFwHQZf4bLH9VpkFzj4saxq4/wK+l2vJBLkqQQU X-Received: by 2002:a17:902:ab87:: with SMTP id f7mr50660996plr.85.1554996110146; Thu, 11 Apr 2019 08:21:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1554996110; cv=none; d=google.com; s=arc-20160816; b=WnuOCF/1jBymVIahbFwc55YBGcbYgovtMSI0dDTCrTYJ4+62qBxt8GO94s78dPt8lj pWr6U2dHx1YQyqphOrU2PkIM4DvUWLLPqfER0qjKt7MmxAdBVrsSeo1MUgZ+w5lOG1ll CpygcfokR/76zyhEiqafvoNs7a5odAFLi5LlAjZhVEJwOYzGfwgAo9chvc9FGoenpOrX tFgtETe5Ub6FVvneWoPvnsNVWD1v1zN/oQoENERIzLNVmAWsa4Fm7bHj4XQ18BXzxcJt R3o0smF3vABUV75M2D9kFpRn9mdccdCqXLWYnao0btkAA9n7iVaYz9eeoZpBo1mrCYly rqFA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=gvkErz07z/zs6QLuG75dNvZDEXuPwxUe15TghfaWU2E=; b=pyOSkmH7VWJxrdoiDUAT20RP99j7flE1YrjyV5B1dx237NOlODEWi5n4WUUj9YtO9K bfwvXz5JNXqO49UYA7KzuNL1RL2QhADs//Xo7At14j0RLVDjFVFkincFjEs0luo1MV8d iyaYilsLPaEGrHDG9Ee4fU+qpvhdQHLQapiUpkp4eRhyaWvXlsy5I5xJa0QcZQlzXC+z rrYfANYtn4hhy2OTUz13mcBgRp3EHtwlk3uujGjvCGPKv9UqMHdielLfbS2UaiTDv4lh WI4i9ndxdlFbD26cKlVjLpTAeeCkcV22cngAkiVv34F/Fz9lHP7hFCyUainBSJn/7IVf mKGg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j64si11013890pgd.537.2019.04.11.08.21.34; Thu, 11 Apr 2019 08:21:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726895AbfDKPTS (ORCPT + 99 others); Thu, 11 Apr 2019 11:19:18 -0400 Received: from mx2.suse.de ([195.135.220.15]:48070 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726630AbfDKPTS (ORCPT ); Thu, 11 Apr 2019 11:19:18 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B93ADABE1; Thu, 11 Apr 2019 15:19:15 +0000 (UTC) Date: Thu, 11 Apr 2019 17:19:11 +0200 From: Michal Hocko To: Waiman Long Cc: Tejun Heo , Li Zefan , Johannes Weiner , Jonathan Corbet , Vladimir Davydov , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Roman Gushchin , Shakeel Butt , Kirill Tkhai , Aaron Lu Subject: Re: [RFC PATCH 0/2] mm/memcontrol: Finer-grained memory control Message-ID: <20190411151911.GZ10383@dhcp22.suse.cz> References: <20190410191321.9527-1-longman@redhat.com> <20190410195443.GL10383@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 11-04-19 10:02:16, Waiman Long wrote: > On 04/10/2019 03:54 PM, Michal Hocko wrote: > > On Wed 10-04-19 15:13:19, Waiman Long wrote: > >> The current control mechanism for memory cgroup v2 lumps all the memory > >> together irrespective of the type of memory objects. However, there > >> are cases where users may have more concern about one type of memory > >> usage than the others. > >> > >> We have customer request to limit memory consumption on anonymous memory > >> only as they said the feature was available in other OSes like Solaris. > > Please be more specific about a usecase. > > From that customer's point of view, page cache is more like common goods > that can typically be shared by a number of different groups. Depending > on which groups touch the pages first, it is possible that most of those > pages can be disproportionately attributed to one group than the others. > Anonymous memory, on the other hand, are not shared and so can more > correctly represent the memory footprint of an application. Of course, > there are certainly cases where an application can have large private > files that can consume a lot of cache pages. These are probably not the > case for the applications used by that customer. So you are essentially interested in the page cache limiting, right? This has been proposed several times already and always rejected because this is not a good idea. I would really like to see a more specific example where this makes sense. False sharing can be certainly happen, no questions about that but then the how big of a problem that is? Please more specifics. > >> To allow finer-grained control of memory, this patchset 2 new control > >> knobs for memory controller: > >> - memory.subset.list for specifying the type of memory to be under control. > >> - memory.subset.high for the high limit of memory consumption of that > >> memory type. > > Please be more specific about the semantic. > > > > I am really skeptical about this feature to be honest, though. > > > > Please see patch 1 which has a more detailed description. This is just > an overview for the cover letter. No, please describe the whole design in high level in the cover letter. I am not going to spend time reviewing specific patches if the whole idea is not clear beforhand. Design should be clear first before diving into technical details. > >> For simplicity, the limit is not hierarchical and applies to only tasks > >> in the local memory cgroup. > > This is a no-go to begin with. > > The reason for doing that is to introduce as little overhead as > possible. We are not going to break semantic based on very vague hand waving about overhead. > We can certainly make it hierarchical, but it will complicate > the code and increase runtime overhead. Another alternative is to limit > this feature to only leaf memory cgroups. That should be enough to cover > what the customer is asking for and leave room for future hierarchical > extension, if needed. No, this is a broken design that doesn't fall into the over cgroups design. -- Michal Hocko SUSE Labs