DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=date:from:to:cc:subject:message-id:references:mime-version
         :content-type:content-disposition:in-reply-to:user-agent;
        b=VGuM6fFseUw/IDuAFD9VunM/Y7NNQ0vOEHisWIr7805bZC3mZPEtTk1eJtOW2NRrVr
         pqgNlJ9ZWHiCsawC5euWmxSNo1c9VyE4irZJoyxjtlNjYDdclHTolJDMmO3sV9QGT10R
         TpHl4STu/klWeILJnDHacYc9c47/R47ewyyBQ=
Date: Sun, 26 Apr 2009 19:19:21 +0200
From: Andrea Righi <righi.andrea@gmail.com>
To: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>, Balbir Singh <balbir@linux.vnet.ibm.com>,
       KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>, agk@sourceware.org,
       akpm@linux-foundation.org, axboe@kernel.dk, baramsori72@gmail.com,
       Carl Henrik Lunde <chlunde@ping.uio.no>, dave@linux.vnet.ibm.com,
       Divyesh Shah <dpshah@google.com>, eric.rannaud@gmail.com,
       fernando@oss.ntt.co.jp, Hirokazu Takahashi <taka@valinux.co.jp>,
       Li Zefan <lizf@cn.fujitsu.com>, matt@bluehost.com,
       dradford@bluehost.com, ngupta@google.com, randy.dunlap@oracle.com,
       roberto@unbit.it, Ryo Tsuruta <ryov@valinux.co.jp>,
       Satoshi UCHIDA <s-uchida@ap.jp.nec.com>, subrata@linux.vnet.ibm.com,
       yoshikawa.takuya@oss.ntt.co.jp, Nauman Rafique <nauman@google.com>,
       fchecconi@gmail.com, paolo.valente@unimore.it,
       containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/7] page_cgroup: provide a generic page tracking
	infrastructure
Message-ID: <20090426171920.GB2336@linux>
References: <1240090712-1058-1-git-send-email-righi.andrea@gmail.com> <1240090712-1058-4-git-send-email-righi.andrea@gmail.com> <49F11FBD.3070705@cn.fujitsu.com> <20090424083116.GB8535@linux> <49F1830F.8020609@cn.fujitsu.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <49F1830F.8020609@cn.fujitsu.com>
User-Agent: Mutt/1.5.18 (2008-05-17)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2999
Lines: 55

On Fri, Apr 24, 2009 at 05:14:55PM +0800, Gui Jianfeng wrote:
> Andrea Righi wrote:
> > On Fri, Apr 24, 2009 at 10:11:09AM +0800, Gui Jianfeng wrote:
> >> Andrea Righi wrote:
> >>> Dirty pages in the page cache can be processed asynchronously by kernel
> >>> threads (pdflush) using a writeback policy. For this reason the real
> >>> writes to the underlying block devices occur in a different IO context
> >>> respect to the task that originally generated the dirty pages involved
> >>> in the IO operation. This makes the tracking and throttling of writeback
> >>> IO more complicate respect to the synchronous IO.
> >>>
> >>> The page_cgroup infrastructure, currently available only for the memory
> >>> cgroup controller, can be used to store the owner of each page and
> >>> opportunely track the writeback IO. This information is encoded in
> >>> page_cgroup->flags.
> >>   You encode id in page_cgroup->flags, if a cgroup get removed, IMHO, you
> >>   should remove the corresponding id in flags.
> > 
> > OK, the same same ID could be reused by another cgroup. I think this
> > should happen very rarely because IDs are recovered slowly anyway.
> > 
> > What about simply executing a sys_sync() when a io-throttle cgroup is
> > removed? If we're going to remove a cgroup no additional dirty page will
> > be generated by this cgroup, because it must be empty. And the sync
> > would allow that old dirty pages will be flushed back to disk (for those
> > pages the cgroup ID will be simply ignored).
> > 
> >>   One more thing, if a task is moving from a cgroup to another, the id in
> >>   flags also need to be changed.
> > 
> > I do not agree here. Even if a task is moving from a cgroup to another
> > the cgroup that generated the dirty page is always the old one. Remember
> > that we want to save cgroup's identity in this case, and not the task.
> 
>   If the task moves to a new cgroup, the dirty page generated from the old
>   group still uses the old id. When these dirty pages is writing back to disk, 
>   the corresponding bios will be delayed according to old group's bandwidth 
>   limitation. Am i right? I think we should use the new bandwidth limitation
>   when actual IO happens. So we need to use new id for these pages. But i think
>   the implementation for this functionality must be very complicated. :)

Right, but as I said statistics are per cgroup, not per task. And even
if a task moves to another cgroup it is correct IMHO to use the old
cgroup id to writeback the dirty pages, because it is actually IO
activity generated before the task's movement and the old rules should
be applied. I don't see a strong motivation to change this behaviour and
complicate/slow down the current implementation.

Thanks,
-Andrea
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/