Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754427AbbLPCoi (ORCPT ); Tue, 15 Dec 2015 21:44:38 -0500 Received: from mgwkm03.jp.fujitsu.com ([202.219.69.170]:25163 "EHLO mgwkm03.jp.fujitsu.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751406AbbLPCoh (ORCPT ); Tue, 15 Dec 2015 21:44:37 -0500 X-SecurityPolicyCheck: OK by SHieldMailChecker v2.3.2 X-SHieldMailCheckerPolicyVersion: FJ-ISEC-20150223 X-SHieldMailCheckerMailID: 949069bb883e4bf5a25d450675adfe65 Subject: Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2 To: Vladimir Davydov References: <265d8fe623ed2773d69a26d302eb31e335377c77.1449742560.git.vdavydov@virtuozzo.com> <20151214153037.GB4339@dhcp22.suse.cz> <20151214194258.GH28521@esperanza> <566F8781.80108@jp.fujitsu.com> <20151215110219.GJ28521@esperanza> Cc: Michal Hocko , Andrew Morton , Johannes Weiner , linux-mm@kvack.org, linux-kernel@vger.kernel.org From: Kamezawa Hiroyuki Message-ID: <5670CFFA.3060309@jp.fujitsu.com> Date: Wed, 16 Dec 2015 11:44:10 +0900 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0 MIME-Version: 1.0 In-Reply-To: <20151215110219.GJ28521@esperanza> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-MML: disable Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5402 Lines: 114 On 2015/12/15 20:02, Vladimir Davydov wrote: > On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote: >> On 2015/12/15 4:42, Vladimir Davydov wrote: >>> On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote: >>>> On Thu 10-12-15 14:39:14, Vladimir Davydov wrote: >>>>> In the legacy hierarchy we charge memsw, which is dubious, because: >>>>> >>>>> - memsw.limit must be >= memory.limit, so it is impossible to limit >>>>> swap usage less than memory usage. Taking into account the fact that >>>>> the primary limiting mechanism in the unified hierarchy is >>>>> memory.high while memory.limit is either left unset or set to a very >>>>> large value, moving memsw.limit knob to the unified hierarchy would >>>>> effectively make it impossible to limit swap usage according to the >>>>> user preference. >>>>> >>>>> - memsw.usage != memory.usage + swap.usage, because a page occupying >>>>> both swap entry and a swap cache page is charged only once to memsw >>>>> counter. As a result, it is possible to effectively eat up to >>>>> memory.limit of memory pages *and* memsw.limit of swap entries, which >>>>> looks unexpected. >>>>> >>>>> That said, we should provide a different swap limiting mechanism for >>>>> cgroup2. >>>>> This patch adds mem_cgroup->swap counter, which charges the actual >>>>> number of swap entries used by a cgroup. It is only charged in the >>>>> unified hierarchy, while the legacy hierarchy memsw logic is left >>>>> intact. >>>> >>>> I agree that the previous semantic was awkward. The problem I can see >>>> with this approach is that once the swap limit is reached the anon >>>> memory pressure might spill over to other and unrelated memcgs during >>>> the global memory pressure. I guess this is what Kame referred to as >>>> anon would become mlocked basically. This would be even more of an issue >>>> with resource delegation to sub-hierarchies because nobody will prevent >>>> setting the swap amount to a small value and use that as an anon memory >>>> protection. >>> >>> AFAICS such anon memory protection has a side-effect: real-life >>> workloads need page cache to run smoothly (at least for mapping >>> executables). Disabling swapping would switch pressure to page caches, >>> resulting in performance degradation. So, I don't think per memcg swap >>> limit can be abused to boost your workload on an overcommitted system. >>> >>> If you mean malicious users, well, they already have plenty ways to eat >>> all available memory up to the hard limit by creating unreclaimable >>> kernel objects. >>> >> "protect anon" user's malicious degree is far lower than such cracker like users. > > What do you mean by "malicious degree"? What is such a user trying to > achieve? Killing the system? Well, there are much more effective ways to > do so. Or does it want to exploit a system specific feature to get > benefit for itself? If so, it will hardly win by mlocking all anonymous > memory, because this will result in higher pressure exerted upon its > page cache and dcache, which normal workloads just can't get along > without. > I wanted to say almost all application developers want to set swap.limit=0 if allowed. So, it's a usual people who can kill the system if swap imbalance is allowed. >> >>> Anyway, if you don't trust a container you'd better set the hard memory >>> limit so that it can't hurt others no matter what it runs and how it >>> tweaks its sub-tree knobs. >>> >> >> Limiting swap can easily cause "OOM-Killer even while there are >> available swap" with easy mistake. > > What do you mean by "easy mistake"? Misconfiguration? If so, it's a lame > excuse IMO. Admin should take system configuration seriously. If the > host is not overcommitted, it's trivial. Otherwise, there's always a > chance that things will go south, so it's not going to be easy. It's up > to admin to analyze risks and set limits accordingly. Exporting knobs > with clear meaning is the best we can do here. swap.max is one such knob > It defines maximal usage of swap resource. Allowing to breach it just > does not add up. > >> Can't you add "swap excess" switch to sysctl to allow global memory >> reclaim can ignore swap limitation ? > > I'd be opposed to it, because this would obscure the user API. OTOH, a > kind of swap soft limit (swap.high?) might be considered. I'm not sure > if it's really necessary though, because all arguments for it do not > look convincing to me for now. So, personally, I would refrain from > implementing it until it is really called for by users of cgroup v2. > Considering my customers, running OOM-Killer while there are free swap space is system's error rather than their misconfiguration. BTW, mlock() requires CAP_IPC_LOCK. please set default unlimited and check capability at setting swap limit, at least. Thanks, -Kame > Thanks, > Vladimir > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/