Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754585AbbBKV5I (ORCPT ); Wed, 11 Feb 2015 16:57:08 -0500 Received: from mail-lb0-f170.google.com ([209.85.217.170]:59655 "EHLO mail-lb0-f170.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754321AbbBKV5G (ORCPT ); Wed, 11 Feb 2015 16:57:06 -0500 MIME-Version: 1.0 In-Reply-To: <20150211214650.GA11920@htj.duckdns.org> References: <20150205222522.GA10580@htj.dyndns.org> <20150206141746.GB10580@htj.dyndns.org> <20150207143839.GA9926@htj.dyndns.org> <20150211021906.GA21356@htj.duckdns.org> <20150211203359.GF21356@htj.duckdns.org> <20150211214650.GA11920@htj.duckdns.org> Date: Thu, 12 Feb 2015 01:57:04 +0400 Message-ID: Subject: Re: [RFC] Making memcg track ownership per address_space or anon_vma From: Konstantin Khlebnikov To: Tejun Heo Cc: Greg Thelen , Konstantin Khlebnikov , Johannes Weiner , Michal Hocko , Cgroups , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Jan Kara , Dave Chinner , Jens Axboe , Christoph Hellwig , Li Zefan , Hugh Dickins Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2400 Lines: 53 On Thu, Feb 12, 2015 at 12:46 AM, Tejun Heo wrote: > Hello, > > On Thu, Feb 12, 2015 at 12:22:34AM +0300, Konstantin Khlebnikov wrote: >> > Yeah, available memory to the matching memcg and the number of dirty >> > pages in it. It's gonna work the same way as the global case just >> > scoped to the cgroup. >> >> That might be a problem: all dirty pages accounted to cgroup must be >> reachable for its own personal writeback or balanace-drity-pages will be >> unable to satisfy memcg dirty memory thresholds. I've done accounting > > Yeah, it would. Why wouldn't it? How do you plan to do per-memcg/blkcg writeback for balance-dirty-pages? Or you're thinking only about separating writeback flow into blkio cgroups without actual inode filtering? I mean delaying inode writeback and keeping dirty pages as long as possible if their cgroups are far from threshold. > >> for per-inode owner, but there is another option: shared inodes might be >> handled differently and will be available for all (or related) cgroup >> writebacks. > > I'm not following you at all. The only reason this scheme can work is > because we exclude persistent shared write cases. As the whole thing > is based on that assumption, special casing shared inodes doesn't make > any sense. Doing things like allowing all cgroups to write shared > inodes without getting memcg on-board almost immediately breaks > pressure propagation while making shared writes a lot more attractive > and increasing implementation complexity substantially. Am I missing > something? > >> Another side is that reclaimer now (mosly?) never trigger pageout. >> Memcg reclaimer should do something if it finds shared dirty page: >> either move it into right cgroup or make that inode reachable for >> memcg writeback. I've send patch which marks shared dirty inodes >> with flag I_DIRTY_SHARED or so. > > It *might* make sense for memcg to drop pages being dirtied which > don't match the currently associated blkcg of the inode; however, > again, as we're basically declaring that shared writes aren't > supported, I'm skeptical about the usefulness. > > Thanks. > > -- > tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/