Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757372Ab1FGVHG (ORCPT ); Tue, 7 Jun 2011 17:07:06 -0400 Received: from mx1.redhat.com ([209.132.183.28]:11879 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756837Ab1FGVHB (ORCPT ); Tue, 7 Jun 2011 17:07:01 -0400 Date: Tue, 7 Jun 2011 17:05:40 -0400 From: Vivek Goyal To: Greg Thelen Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, containers@lists.osdl.org, linux-fsdevel@vger.kernel.org, Andrea Righi , Balbir Singh , KAMEZAWA Hiroyuki , Daisuke Nishimura , Minchan Kim , Johannes Weiner , Ciju Rajan K , David Rientjes , Wu Fengguang , Dave Chinner Subject: Re: [PATCH v8 11/12] writeback: make background writeback cgroup aware Message-ID: <20110607210540.GB30919@redhat.com> References: <1307117538-14317-1-git-send-email-gthelen@google.com> <1307117538-14317-12-git-send-email-gthelen@google.com> <20110607193835.GD26965@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3535 Lines: 82 On Tue, Jun 07, 2011 at 01:43:08PM -0700, Greg Thelen wrote: > Vivek Goyal writes: > > > On Fri, Jun 03, 2011 at 09:12:17AM -0700, Greg Thelen wrote: > >> When the system is under background dirty memory threshold but a cgroup > >> is over its background dirty memory threshold, then only writeback > >> inodes associated with the over-limit cgroup(s). > >> > > > > [..] > >> -static inline bool over_bground_thresh(void) > >> +static inline bool over_bground_thresh(struct bdi_writeback *wb, > >> + struct writeback_control *wbc) > >> { > >> unsigned long background_thresh, dirty_thresh; > >> > >> global_dirty_limits(&background_thresh, &dirty_thresh); > >> > >> - return (global_page_state(NR_FILE_DIRTY) + > >> - global_page_state(NR_UNSTABLE_NFS) > background_thresh); > >> + if (global_page_state(NR_FILE_DIRTY) + > >> + global_page_state(NR_UNSTABLE_NFS) > background_thresh) { > >> + wbc->for_cgroup = 0; > >> + return true; > >> + } > >> + > >> + wbc->for_cgroup = 1; > >> + wbc->shared_inodes = 1; > >> + return mem_cgroups_over_bground_dirty_thresh(); > >> } > > > > Hi Greg, > > > > So all the logic of writeout from mem cgroup works only if system is > > below background limit. The moment we cross background limit, looks > > like we will fall back to existing way of writting inodes? > > Correct. If the system is over its background limit then the previous > cgroup-unaware background writeback occurs. I think of the system > limits as those of the root cgroup. If the system is over the global > limit than all cgroups are eligible for writeback. In this situation > the current code does not distinguish between cgroups over or under > their dirty background limit. > > Vivek Goyal writes: > > If yes, then from design point of view it is little odd that as long > > as we are below background limit, we share the bdi between different > > cgroups. The moment we are above background limit, we fall back to > > algorithm of sharing the disk among individual inodes and forget > > about memory cgroups. Kind of awkward. > > > > This kind of cgroup writeback I think will atleast not solve the problem > > for CFQ IO controller, as we fall back to old ways of writting back inodes > > the moment we cross dirty ratio. > > It might make more sense to reverse the order of the checks in the > proposed over_bground_thresh(): the new version would first check if any > memcg are over limit; assuming none are over limit, then check global > limits. Assuming that the system is over its background limit and some > cgroups are also over their limits, then the over limit cgroups would > first be written possibly getting the system below its limit. Does this > address your concern? Do you treat root group also as any other cgroup? If no, then above logic can lead to issue of starvation of root group inode. Or unfair writeback. So I guess it will be important to treat root group same as other groups. > > Note: mem_cgroup_balance_dirty_pages() (patch 10/12) will perform > foreground writeback when a memcg is above its dirty limit. This would > offer CFQ multiple tasks issuing IO. I guess we can't rely on this as this will go away once IO less dirty throttling is merged. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/