Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp787056imu; Wed, 9 Jan 2019 06:25:34 -0800 (PST) X-Google-Smtp-Source: ALg8bN7nTU7W5TdLd+c1fIN34wTX7932vNDInhsjLcCupKTk4Ho4lIoXiHHTHHhv49921DfnSANa X-Received: by 2002:a63:77ce:: with SMTP id s197mr3338735pgc.89.1547043934208; Wed, 09 Jan 2019 06:25:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547043934; cv=none; d=google.com; s=arc-20160816; b=vGWhpQEkU18bc1BYyXXQ+7ztVLDwC15faXGU8vNmqze2V3jgrYJk173pJde+/tvld2 hmtm5QhbRPU0iTEvQUnLrSJU5SGqNa4VdoTaf6yMU8yFbaTjQNgsRcpO72vwNQLlMOXE plyGPqCpkTeJdVePw/NVjPq3KC4rQGD3n3OeRkwsJnY7WfB4BDjxXlWsF6pZXOQ/b4jh GeeBnW6xg11dVYZOw7+3Gsp6ZW1LCY9t6AWOmwpJrlBXCdPoXTikL+F33KcpGSZUu5ta iszKorbCPMLQRSXdgWJsB8O1EHxh6drimJGR3SafgXvuePhYgIcoVBGl5d3Ds81rADgh 0v4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=ag4aV/2QeRgqCN8ryMKtGuwVBr2wc1xC3Eac3/4mKb8=; b=ngeWnWlFQ07I+IzgDKIDPMGxk5kPz9m/CmP1ENuE3UMOQzf7CnCOjHAclHaahDFb26 Urs5JNOaAsbY4WDzHyPo28SGyqi6t9OpiUeGBuaHAgU9J20oPZnGqAV0tpVyc0hn7YlF 2NTGH1HGShWDisPsg78+MBimCYXJzFAtrHB4Znjd2KA+5h+b2q6lce8gMGi8n0vgjAXz bVtnXqNFVmIr3VPLffEsCjFqOcZCk3r6bqql7MhXfaw3/2+2kjcMReV1exgNpgyLogg8 Iq7FPVN6EgbtgItFRDgR8VDdXDlwJWzQ0wDuh0V2aoCIMLHAIEEPO8ea4/GrGurQ2RBc lYCw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id h19si66689486plr.67.2019.01.09.06.25.18; Wed, 09 Jan 2019 06:25:34 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731198AbfAIOLQ (ORCPT + 99 others); Wed, 9 Jan 2019 09:11:16 -0500 Received: from mx2.suse.de ([195.135.220.15]:58000 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729994AbfAIOLQ (ORCPT ); Wed, 9 Jan 2019 09:11:16 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 04FD5AF6D; Wed, 9 Jan 2019 14:11:14 +0000 (UTC) Date: Wed, 9 Jan 2019 15:11:13 +0100 From: Michal Hocko To: Kirill Tkhai Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, josef@toxicpanda.com, jack@suse.cz, hughd@google.com, darrick.wong@oracle.com, aryabinin@virtuozzo.com, guro@fb.com, mgorman@techsingularity.net, shakeelb@google.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH RFC 0/3] mm: Reduce IO by improving algorithm of memcg pagecache pages eviction Message-ID: <20190109141113.GW31793@dhcp22.suse.cz> References: <154703479840.32690.6504699919905946726.stgit@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <154703479840.32690.6504699919905946726.stgit@localhost.localdomain> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed 09-01-19 15:20:18, Kirill Tkhai wrote: > On nodes without memory overcommit, it's common a situation, > when memcg exceeds its limit and pages from pagecache are > shrinked on reclaim, while node has a lot of free memory. Yes, that is the semantic of the hard limit. If the system is not overcommitted then the hard limit can be used to prevent unexpected direct reclaim from unrelated activity. > Further access to the pages requires real device IO, while > IO causes time delays, worse powerusage, worse throughput > for other users of the device, etc. It is to be expected that a memory throttled usage will have this side effect IMO. > Cleancache is not a good solution for this problem, since > it implies copying of page on every cleancache_put_page() > and cleancache_get_page(). Also, it requires introduction > of internal per-cleancache_ops data structures to manage > cached pages and their inodes relationships, which again > introduces overhead. > > This patchset introduces another solution. It introduces > a new scheme for evicting memcg pages: > > 1)__remove_mapping() uncharges unmapped page memcg > and leaves page in pagecache on memcg reclaim; > > 2)putback_lru_page() places page into root_mem_cgroup > list, since its memcg is NULL. Page may be evicted > on global reclaim (and this will be easily, as > page is not mapped, so shrinker will shrink it > with 100% probability of success); > > 3)pagecache_get_page() charges page into memcg of > a task, which takes it first. But this also means that any hard limited memcg can fill up all the memory and break the above assumption about the isolation from direct reclaim. Not to mention the OOM or is there anything you do anything about preventing that? That beig said, I do not think we want to or even can change the semantic of the hard limit and break existing setups. I am still interested to hear more about more detailed/specific usecases that might benefit from this behavior. Why do those users even use hard limit at all? To protect from anon memory leaks? Do different memcgs share the page cache heavily? -- Michal Hocko SUSE Labs