Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755936AbZDZRTj (ORCPT ); Sun, 26 Apr 2009 13:19:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755097AbZDZRT1 (ORCPT ); Sun, 26 Apr 2009 13:19:27 -0400 Received: from mail-bw0-f163.google.com ([209.85.218.163]:49726 "EHLO mail-bw0-f163.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754444AbZDZRT0 (ORCPT ); Sun, 26 Apr 2009 13:19:26 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:from:to:cc:subject:message-id:references:mime-version :content-type:content-disposition:in-reply-to:user-agent; b=VGuM6fFseUw/IDuAFD9VunM/Y7NNQ0vOEHisWIr7805bZC3mZPEtTk1eJtOW2NRrVr pqgNlJ9ZWHiCsawC5euWmxSNo1c9VyE4irZJoyxjtlNjYDdclHTolJDMmO3sV9QGT10R TpHl4STu/klWeILJnDHacYc9c47/R47ewyyBQ= Date: Sun, 26 Apr 2009 19:19:21 +0200 From: Andrea Righi To: Gui Jianfeng Cc: Paul Menage , Balbir Singh , KAMEZAWA Hiroyuki , agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk, baramsori72@gmail.com, Carl Henrik Lunde , dave@linux.vnet.ibm.com, Divyesh Shah , eric.rannaud@gmail.com, fernando@oss.ntt.co.jp, Hirokazu Takahashi , Li Zefan , matt@bluehost.com, dradford@bluehost.com, ngupta@google.com, randy.dunlap@oracle.com, roberto@unbit.it, Ryo Tsuruta , Satoshi UCHIDA , subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp, Nauman Rafique , fchecconi@gmail.com, paolo.valente@unimore.it, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure Message-ID: <20090426171920.GB2336@linux> References: <1240090712-1058-1-git-send-email-righi.andrea@gmail.com> <1240090712-1058-4-git-send-email-righi.andrea@gmail.com> <49F11FBD.3070705@cn.fujitsu.com> <20090424083116.GB8535@linux> <49F1830F.8020609@cn.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49F1830F.8020609@cn.fujitsu.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2999 Lines: 55 On Fri, Apr 24, 2009 at 05:14:55PM +0800, Gui Jianfeng wrote: > Andrea Righi wrote: > > On Fri, Apr 24, 2009 at 10:11:09AM +0800, Gui Jianfeng wrote: > >> Andrea Righi wrote: > >>> Dirty pages in the page cache can be processed asynchronously by kernel > >>> threads (pdflush) using a writeback policy. For this reason the real > >>> writes to the underlying block devices occur in a different IO context > >>> respect to the task that originally generated the dirty pages involved > >>> in the IO operation. This makes the tracking and throttling of writeback > >>> IO more complicate respect to the synchronous IO. > >>> > >>> The page_cgroup infrastructure, currently available only for the memory > >>> cgroup controller, can be used to store the owner of each page and > >>> opportunely track the writeback IO. This information is encoded in > >>> page_cgroup->flags. > >> You encode id in page_cgroup->flags, if a cgroup get removed, IMHO, you > >> should remove the corresponding id in flags. > > > > OK, the same same ID could be reused by another cgroup. I think this > > should happen very rarely because IDs are recovered slowly anyway. > > > > What about simply executing a sys_sync() when a io-throttle cgroup is > > removed? If we're going to remove a cgroup no additional dirty page will > > be generated by this cgroup, because it must be empty. And the sync > > would allow that old dirty pages will be flushed back to disk (for those > > pages the cgroup ID will be simply ignored). > > > >> One more thing, if a task is moving from a cgroup to another, the id in > >> flags also need to be changed. > > > > I do not agree here. Even if a task is moving from a cgroup to another > > the cgroup that generated the dirty page is always the old one. Remember > > that we want to save cgroup's identity in this case, and not the task. > > If the task moves to a new cgroup, the dirty page generated from the old > group still uses the old id. When these dirty pages is writing back to disk, > the corresponding bios will be delayed according to old group's bandwidth > limitation. Am i right? I think we should use the new bandwidth limitation > when actual IO happens. So we need to use new id for these pages. But i think > the implementation for this functionality must be very complicated. :) Right, but as I said statistics are per cgroup, not per task. And even if a task moves to another cgroup it is correct IMHO to use the old cgroup id to writeback the dirty pages, because it is actually IO activity generated before the task's movement and the old rules should be applied. I don't see a strong motivation to change this behaviour and complicate/slow down the current implementation. Thanks, -Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/