Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751711Ab1CPVwl (ORCPT ); Wed, 16 Mar 2011 17:52:41 -0400 Received: from zene.cmpxchg.org ([85.214.230.12]:56881 "EHLO zene.cmpxchg.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751143Ab1CPVwd (ORCPT ); Wed, 16 Mar 2011 17:52:33 -0400 Date: Wed, 16 Mar 2011 22:52:14 +0100 From: Johannes Weiner To: Greg Thelen Cc: Vivek Goyal , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, linux-fsdevel@vger.kernel.org, Andrea Righi , Balbir Singh , KAMEZAWA Hiroyuki , Daisuke Nishimura , Minchan Kim , Ciju Rajan K , David Rientjes , Wu Fengguang , Chad Talbott , Justin TerAvest Subject: Re: [PATCH v6 0/9] memcg: per cgroup dirty page accounting Message-ID: <20110316215214.GO2140@cmpxchg.org> References: <1299869011-26152-1-git-send-email-gthelen@google.com> <20110311171006.ec0d9c37.akpm@linux-foundation.org> <20110314202324.GG31120@redhat.com> <20110315184839.GB5740@redhat.com> <20110316131324.GM2140@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3038 Lines: 63 On Wed, Mar 16, 2011 at 02:19:26PM -0700, Greg Thelen wrote: > On Wed, Mar 16, 2011 at 6:13 AM, Johannes Weiner wrote: > > On Tue, Mar 15, 2011 at 02:48:39PM -0400, Vivek Goyal wrote: > >> I think even for background we shall have to implement some kind of logic > >> where inodes are selected by traversing memcg->lru list so that for > >> background write we don't end up writting too many inodes from other > >> root group in an attempt to meet the low background ratio of memcg. > >> > >> So to me it boils down to coming up a new inode selection logic for > >> memcg which can be used both for background as well as foreground > >> writes. This will make sure we don't end up writting pages from the > >> inodes we don't want to. > > > > Originally for struct page_cgroup reduction, I had the idea of > > introducing something like > > > > ? ? ? ?struct memcg_mapping { > > ? ? ? ? ? ? ? ?struct address_space *mapping; > > ? ? ? ? ? ? ? ?struct mem_cgroup *memcg; > > ? ? ? ?}; > > > > hanging off page->mapping to make memcg association no longer per-page > > and save the pc->memcg linkage (it's not completely per-inode either, > > multiple memcgs can still refer to a single inode). > > > > We could put these descriptors on a per-memcg list and write inodes > > from this list during memcg-writeback. > > > > We would have the option of extending this structure to contain hints > > as to which subrange of the inode is actually owned by the cgroup, to > > further narrow writeback to the right pages - iff shared big files > > become a problem. > > > > Does that sound feasible? > > If I understand your memcg_mapping proposal, then each inode could > have a collection of memcg_mapping objects representing the set of > memcg that were charged for caching pages of the inode's data. When a > new file page is charged to a memcg, then the inode's set of > memcg_mapping would be scanned to determine if current's memcg is > already in the memcg_mapping set. If this is the first page for the > memcg within the inode, then a new memcg_mapping would be allocated > and attached to the inode. The memcg_mapping may be reference counted > and would be deleted when the last inode page for a particular memcg > is uncharged. Dead-on. Well, on which side you put the list - a per-memcg list of inodes, or a per-inode list of memcgs - really depends on which way you want to do the lookups. But this is the idea, yes. > page->mapping = &memcg_mapping > inode->i_mapping = collection of memcg_mapping, grows/shrinks with [un]charge If the memcg_mapping list (or hash-table for quick find-or-create?) was to be on the inode side, I'd put it in struct address_space, since this is all about page cache, not so much an fs thing. Still, correct in general. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/