Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754146AbbBFADl (ORCPT ); Thu, 5 Feb 2015 19:03:41 -0500 Received: from mail-ig0-f170.google.com ([209.85.213.170]:40777 "EHLO mail-ig0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753440AbbBFADi (ORCPT ); Thu, 5 Feb 2015 19:03:38 -0500 References: <20150130062737.GB25699@htj.dyndns.org> <20150130160722.GA26111@htj.dyndns.org> <54CFCF74.6090400@yandex-team.ru> <20150202194608.GA8169@htj.dyndns.org> <20150204170656.GA18858@htj.dyndns.org> <20150205131514.GD25736@htj.dyndns.org> <20150205222522.GA10580@htj.dyndns.org> From: Greg Thelen To: Tejun Heo Cc: Konstantin Khlebnikov , Johannes Weiner , Michal Hocko , Cgroups , "linux-mm\@kvack.org" , "linux-kernel\@vger.kernel.org" , Jan Kara , Dave Chinner , Jens Axboe , Christoph Hellwig , Li Zefan , Hugh Dickins Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma In-reply-to: <20150205222522.GA10580@htj.dyndns.org> Date: Thu, 05 Feb 2015 16:03:34 -0800 Message-ID: MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3078 Lines: 68 On Thu, Feb 05 2015, Tejun Heo wrote: > Hey, > > On Thu, Feb 05, 2015 at 02:05:19PM -0800, Greg Thelen wrote: >> > A >> > +-B (usage=2M lim=3M min=2M hosted_usage=2M) >> > +-C (usage=0 lim=2M min=1M shared_usage=2M) >> > +-D (usage=0 lim=2M min=1M shared_usage=2M) >> > \-E (usage=0 lim=2M min=0) > ... >> Maybe, but I want to understand more about how pressure works in the >> child. As C (or D) allocates non shared memory does it perform reclaim >> to ensure that its (C.usage + C.shared_usage < C.lim). Given C's > > Yes. > >> shared_usage is linked into B.LRU it wouldn't be naturally reclaimable >> by C. Are you thinking that charge failures on cgroups with non zero >> shared_usage would, as needed, induce reclaim of parent's hosted_usage? > > Hmmm.... I'm not really sure but why not? If we properly account for > the low protection when pushing inodes to the parent, I don't think > it'd break anything. IOW, allow the amount beyond the sum of low > limits to be reclaimed when one of the sharers is under pressure. > > Thanks. I'm not saying that it'd break anything. I think it's required that children perform reclaim on shared data hosted in the parent. The child is limited by shared_usage, so it needs ability to reclaim it. So I think we're in agreement. Child will reclaim parent's hosted_usage when the child is charged for shared_usage. Ideally the only parental memory reclaimed in this situation would be shared. But I think (though I can't claim to have followed the new memcg philosophy discussions) that internal nodes in the cgroup tree (i.e. parents) do not have any resources charged directly to them. All resources are charged to leaf cgroups which linger until resources are uncharged. Thus the LRUs of parent will only contain hosted (shared) memory. This thankfully focus parental reclaim easy on shared pages. Child pressure will, unfortunately, reclaim shared pages used by any container. But if shared pages were charged all sharing containers, then it will help relieve pressure in the caller. So this is a system which charges all cgroups using a shared inode (recharge on read) for all resident pages of that shared inode. There's only one copy of the page in memory on just one LRU, but the page may be charged to multiple container's (shared_)usage. Perhaps I missed it, but what happens when a child's limit is insufficient to accept all pages shared by its siblings? Example starting with 2M cached of a shared file: A +-B (usage=2M lim=3M hosted_usage=2M) +-C (usage=0 lim=2M shared_usage=2M) +-D (usage=0 lim=2M shared_usage=2M) \-E (usage=0 lim=1M shared_usage=0) If E faults in a new 4K page within the shared file, then E is a sharing participant so it'd be charged the 2M+4K, which pushes E over it's limit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/