Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932311Ab1COQVg (ORCPT ); Tue, 15 Mar 2011 12:21:36 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26853 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932268Ab1COQVe (ORCPT ); Tue, 15 Mar 2011 12:21:34 -0400 Date: Tue, 15 Mar 2011 12:20:07 -0400 From: Vivek Goyal To: Jan Kara Cc: Greg Thelen , Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, linux-fsdevel@vger.kernel.org, Andrea Righi , Balbir Singh , KAMEZAWA Hiroyuki , Daisuke Nishimura , Minchan Kim , Johannes Weiner , Ciju Rajan K , David Rientjes , Wu Fengguang , Chad Talbott , Justin TerAvest Subject: Re: [PATCH v6 8/9] memcg: check memcg dirty limits in page writeback Message-ID: <20110315162007.GB2587@redhat.com> References: <1299869011-26152-1-git-send-email-gthelen@google.com> <1299869011-26152-9-git-send-email-gthelen@google.com> <20110314175408.GE31120@redhat.com> <20110314211002.GD4998@quack.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110314211002.GD4998@quack.suse.cz> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3770 Lines: 75 On Mon, Mar 14, 2011 at 10:10:03PM +0100, Jan Kara wrote: > On Mon 14-03-11 13:54:08, Vivek Goyal wrote: > > On Fri, Mar 11, 2011 at 10:43:30AM -0800, Greg Thelen wrote: > > > If the current process is in a non-root memcg, then > > > balance_dirty_pages() will consider the memcg dirty limits as well as > > > the system-wide limits. This allows different cgroups to have distinct > > > dirty limits which trigger direct and background writeback at different > > > levels. > > > > > > If called with a mem_cgroup, then throttle_vm_writeout() queries the > > > given cgroup for its dirty memory usage limits. > > > > > > Signed-off-by: Andrea Righi > > > Signed-off-by: Greg Thelen > > > Acked-by: KAMEZAWA Hiroyuki > > > Acked-by: Wu Fengguang > > > --- > > > Changelog since v5: > > > - Simplified this change by using mem_cgroup_balance_dirty_pages() rather than > > > cramming the somewhat different logic into balance_dirty_pages(). This means > > > the global (non-memcg) dirty limits are not passed around in the > > > struct dirty_info, so there's less change to existing code. > > > > Yes there is less change to existing code but now we also have a separate > > throttlig logic for cgroups. > > > > I thought that we are moving in the direction of IO less throttling > > where bdi threads always do the IO and Jan Kara also implemented the > > logic to distribute the finished IO pages uniformly across the waiting > > threads. > Yes, we'd like to avoid doing IO from balance_dirty_pages(). But if the > logic in cgroups specific part won't get too fancy (which it doesn't seem > to be the case currently), it shouldn't be too hard to convert it to the new > approach. > > We can talk about it at LSF but at least with my approach to IO-less > balance_dirty_pages() it would be easy to convert cgroups throttling to > the new way. With Fengguang's approach it might be a bit harder since he > computes a throughput and from that necessary delay for a throttled task > but with cgroups that is impossible to compute so he'd have to add some > looping if we didn't write enough pages from the cgroup yet. But still it > would be reasonable doable AFAICT. > > > Keeping it separate for cgroups, reduces the complexity but also forks > > off the balancing logic for root and other cgroups. So if Jan Kara's > > changes go in, it automatically does not get used for memory cgroups. > > > > Not sure how good a idea it is to use a separate throttling logic for > > for non-root cgroups. > Yeah, it looks a bit odd. I'd think that we could just cap > task_dirty_limit() by a value computed from a cgroup limit and be done > with that but I probably miss something... I think previous implementation did something similar. Currently dirty limit is per_bdi/per_task. They made it per_cgroup/per_bdi/per_task. This new version tries to simplify the things by keeping mem cgroup throttling logic separate. > Sure there is also a different > background limit but that's broken anyway because a flusher thread will > quickly stop doing writeback if global background limit is not exceeded. > But that's a separate topic so I'll reply with this to a more appropriate > email ;) I think last patch in the series (patch 9) takes care of that. In case of mem_cgroup writeback, it forces flusher thread to write till we are below the background ratio of cgroup (and not global background ratio). Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/