Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752008Ab1CKDP0 (ORCPT ); Thu, 10 Mar 2011 22:15:26 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38214 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751077Ab1CKDPY (ORCPT ); Thu, 10 Mar 2011 22:15:24 -0500 Date: Thu, 10 Mar 2011 22:15:04 -0500 From: Vivek Goyal To: KAMEZAWA Hiroyuki Cc: Dave Chinner , Chris Mason , Andreas Dilger , Justin TerAvest , m-ikeda , jaxboe , linux-kernel , ryov , taka , "righi.andrea" , guijianfeng , balbir , ctalbott , nauman , mrubin , linux-fsdevel Subject: Re: [RFC] Storing cgroup id in page->private (Was: Re: [RFC] [PATCH 0/6] Provide cgroup isolation for buffered writes.) Message-ID: <20110311031503.GC11710@redhat.com> References: <20110310191115.GG29464@redhat.com> <20110310194106.GH29464@redhat.com> <1299791640-sup-1874@think> <3EC7D30A-B8F7-416B-8B1D-A19350C57D82@dilger.ca> <20110310213832.GK29464@redhat.com> <1299793340-sup-9066@think> <20110311014618.GC15097@dastard> <20110311021531.GA11710@redhat.com> <20110311115235.3663792c.kamezawa.hiroyu@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110311115235.3663792c.kamezawa.hiroyu@jp.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5247 Lines: 159 On Fri, Mar 11, 2011 at 11:52:35AM +0900, KAMEZAWA Hiroyuki wrote: > On Thu, 10 Mar 2011 21:15:31 -0500 > Vivek Goyal wrote: > > > > IMO, if you really need some per-page information, then just put it > > > in the struct page - you can't hide the memory overhead just by > > > having the filesystem to store it for you. That just adds > > > unnecessary complexity... > > > > Ok. I guess adding anything to struct page is going to be hard and > > we might have to fall back to looking into using page_cgroup for > > tracking page state. I was trying to explore the ways so that we don't > > have to instantiate whole page_cgroup structure just for trying > > to figure out who dirtied the page. > > > > Is this bad ? > == Sounds like an interesting idea. I am primarily concered about the radix tree node size increase. Not sure how big a concern this is. Also tracking is useful for two things. - Proportinal IO - IO throttling For proportional IO, anyway we have to use it with memory controller to control per cgroup dirty ratio so storing info in page_cgroup should not hurt. The only other case where dependence on page_cgroup hurts is IO throttling where IO controller does not really need memory cgroup controller (I hope so). But we are still not sure if throttling IO at device level is a good idea and how to resolve issues related to priority inversion. But this definitely sounds better than adding a new field in page struct as I am assuming that it overall is going to consume less memory. Thanks Vivek > > At handling ASYNC I/O in blkio layer, it's unknown that who dirtied the page. > This lack of information makes impossible to throttole Async I/O per > cgroup in blkio queue layer. > > This patch records the information into radix-tree and preserve the > information. There is no 'clear' operation because all I/O starts when > the page is marked as DIRTY. > > Signed-off-by: KAMEZAWA Hiroyuki > --- > include/linux/radix-tree.h | 3 +++ > lib/radix-tree.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 45 insertions(+) > > Index: mmotm-Mar10/include/linux/radix-tree.h > =================================================================== > --- mmotm-Mar10.orig/include/linux/radix-tree.h > +++ mmotm-Mar10/include/linux/radix-tree.h > @@ -58,12 +58,14 @@ struct radix_tree_root { > unsigned int height; > gfp_t gfp_mask; > struct radix_tree_node __rcu *rnode; > + int iohint; > }; > > #define RADIX_TREE_INIT(mask) { \ > .height = 0, \ > .gfp_mask = (mask), \ > .rnode = NULL, \ > + .iohint = 0, \ > } > > #define RADIX_TREE(name, mask) \ > @@ -74,6 +76,7 @@ do { \ > (root)->height = 0; \ > (root)->gfp_mask = (mask); \ > (root)->rnode = NULL; \ > + (root)->iohint = 0; \ > } while (0) > > /** > Index: mmotm-Mar10/lib/radix-tree.c > =================================================================== > --- mmotm-Mar10.orig/lib/radix-tree.c > +++ mmotm-Mar10/lib/radix-tree.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > > > #ifdef __KERNEL__ > @@ -51,6 +52,9 @@ struct radix_tree_node { > struct rcu_head rcu_head; > void __rcu *slots[RADIX_TREE_MAP_SIZE]; > unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS]; > +#ifdef CONFIG_BLK_CGROUP > + unsigned short iohint[RADIX_TREE_MAP_SIZE]; > +#endif > }; > > struct radix_tree_path { > @@ -473,6 +477,8 @@ void *radix_tree_tag_set(struct radix_tr > offset = (index >> shift) & RADIX_TREE_MAP_MASK; > if (!tag_get(slot, tag, offset)) > tag_set(slot, tag, offset); > + if (height == 1 && slot && tag == PAGECACHE_TAG_DIRTY) > + blkio_record_hint(&slot->iohint[offset]); > slot = slot->slots[offset]; > BUG_ON(slot == NULL); > shift -= RADIX_TREE_MAP_SHIFT; > @@ -1418,3 +1425,38 @@ void __init radix_tree_init(void) > radix_tree_init_maxindex(); > hotcpu_notifier(radix_tree_callback, 0); > } > + > +#ifdef CONFIG_BLK_CGROUP > + > +unsigned short radix_tree_lookup_iohint(struct radix_tree_root *root, > + int index) > +{ > + unsigned int height, shift; > + struct radix_tree_node *node; > + > + node = rcu_redereference(root->rnode); > + if (node == NULL) > + return 0; > + if (!radix_tree_is_indirect_ptr(node)) > + return root->iohint; > + node = radxi_tree_indirect_to_ptr(node); > + > + height = node->height; > + if (index > radix_tree_maxindex(height)) > + return 0; > + shift = (height - 1) * RADIX_TREE_MAP_SHIFT; > + for ( ; ; ) { > + int offset; > + > + if (node == NULL) > + return 0; > + offset = (index >> shift) & RADIX_TREE_MAP_MASK; > + if (height == 1) > + return node->iohint[offset]; > + node = rcu_rereference(node->slots[offset]); > + shift -= RADIX_TREE_MAP_SHIFT; > + height--; > + } > +} > +EXPORT_SYMBOL(radix_tree_lookup_iohint); > +#endif -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/