Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753139Ab0KHPp3 (ORCPT ); Mon, 8 Nov 2010 10:45:29 -0500 Received: from mga03.intel.com ([143.182.124.21]:21984 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752258Ab0KHPp2 (ORCPT ); Mon, 8 Nov 2010 10:45:28 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.58,314,1286175600"; d="scan'208";a="345953253" Date: Mon, 8 Nov 2010 23:45:24 +0800 From: Wu Fengguang To: Johannes Weiner Cc: Minchan Kim , Greg Thelen , Andrew Morton , Dave Young , Andrea Righi , KAMEZAWA Hiroyuki , Daisuke Nishimura , Balbir Singh , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: Re: memcg writeout throttling, was: [patch 4/4] memcg: use native word page statistics counters Message-ID: <20101108154524.GA9530@localhost> References: <1288973333-7891-1-git-send-email-minchan.kim@gmail.com> <20101106010357.GD23393@cmpxchg.org> <20101107215030.007259800@cmpxchg.org> <20101107220353.964566018@cmpxchg.org> <20101108093715.GJ23393@cmpxchg.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20101108093715.GJ23393@cmpxchg.org> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2072 Lines: 46 On Mon, Nov 08, 2010 at 05:37:16PM +0800, Johannes Weiner wrote: > On Mon, Nov 08, 2010 at 09:07:35AM +0900, Minchan Kim wrote: > > BTW, let me ask a question. > > dirty_writeback_pages seems to be depends on mem_cgroup_page_stat's > > result(ie, negative) for separate global and memcg. > > But mem_cgroup_page_stat could return negative value by per-cpu as > > well as root cgroup. > > If I understand right, Isn't it a problem? > > Yes, the numbers are not reliable and may be off by some. It appears > to me that the only sensible interpretation of a negative sum is to > assume zero, though. So to be honest, I don't understand the fallback > to global state when the local state fluctuates around low values. Agreed. It does not make sense to compare values from different domains. The bdi stats use percpu_counter_sum_positive() which never return negative values. It may be suitable for memcg page counts, too. > This function is also only used in throttle_vm_writeout(), where the > outcome is compared to the global dirty threshold. So using the > number of writeback pages _from the current cgroup_ and falling back > to global writeback pages when this number is low makes no sense to me > at all. > > I looks like it should rather compare the cgroup state with the cgroup > limit, and the global state with the global limit. Right. > Can somebody explain the reasoning behind this? And in case it makes > sense after all, put a comment into this function? It seems a better match to test sc->mem_cgroup rather than mem_cgroup_from_task(current). The latter could make mismatches. When someone is changing the memcg limits and hence triggers memcg reclaims, the current task is actually the (unrelated) shell. It's also possible for the memcg task to trigger _global_ direct reclaim. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/