Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755602AbZDXJPl (ORCPT ); Fri, 24 Apr 2009 05:15:41 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755977AbZDXJP3 (ORCPT ); Fri, 24 Apr 2009 05:15:29 -0400 Received: from cn.fujitsu.com ([222.73.24.84]:65175 "EHLO song.cn.fujitsu.com" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1755547AbZDXJP2 (ORCPT ); Fri, 24 Apr 2009 05:15:28 -0400 Message-ID: <49F1830F.8020609@cn.fujitsu.com> Date: Fri, 24 Apr 2009 17:14:55 +0800 From: Gui Jianfeng User-Agent: Thunderbird 2.0.0.5 (Windows/20070716) MIME-Version: 1.0 To: Andrea Righi CC: Paul Menage , Balbir Singh , KAMEZAWA Hiroyuki , agk@sourceware.org, akpm@linux-foundation.org, axboe@kernel.dk, baramsori72@gmail.com, Carl Henrik Lunde , dave@linux.vnet.ibm.com, Divyesh Shah , eric.rannaud@gmail.com, fernando@oss.ntt.co.jp, Hirokazu Takahashi , Li Zefan , matt@bluehost.com, dradford@bluehost.com, ngupta@google.com, randy.dunlap@oracle.com, roberto@unbit.it, Ryo Tsuruta , Satoshi UCHIDA , subrata@linux.vnet.ibm.com, yoshikawa.takuya@oss.ntt.co.jp, Nauman Rafique , fchecconi@gmail.com, paolo.valente@unimore.it, containers@lists.linux-foundation.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 3/7] page_cgroup: provide a generic page tracking infrastructure References: <1240090712-1058-1-git-send-email-righi.andrea@gmail.com> <1240090712-1058-4-git-send-email-righi.andrea@gmail.com> <49F11FBD.3070705@cn.fujitsu.com> <20090424083116.GB8535@linux> In-Reply-To: <20090424083116.GB8535@linux> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2503 Lines: 56 Andrea Righi wrote: > On Fri, Apr 24, 2009 at 10:11:09AM +0800, Gui Jianfeng wrote: >> Andrea Righi wrote: >>> Dirty pages in the page cache can be processed asynchronously by kernel >>> threads (pdflush) using a writeback policy. For this reason the real >>> writes to the underlying block devices occur in a different IO context >>> respect to the task that originally generated the dirty pages involved >>> in the IO operation. This makes the tracking and throttling of writeback >>> IO more complicate respect to the synchronous IO. >>> >>> The page_cgroup infrastructure, currently available only for the memory >>> cgroup controller, can be used to store the owner of each page and >>> opportunely track the writeback IO. This information is encoded in >>> page_cgroup->flags. >> You encode id in page_cgroup->flags, if a cgroup get removed, IMHO, you >> should remove the corresponding id in flags. > > OK, the same same ID could be reused by another cgroup. I think this > should happen very rarely because IDs are recovered slowly anyway. > > What about simply executing a sys_sync() when a io-throttle cgroup is > removed? If we're going to remove a cgroup no additional dirty page will > be generated by this cgroup, because it must be empty. And the sync > would allow that old dirty pages will be flushed back to disk (for those > pages the cgroup ID will be simply ignored). > >> One more thing, if a task is moving from a cgroup to another, the id in >> flags also need to be changed. > > I do not agree here. Even if a task is moving from a cgroup to another > the cgroup that generated the dirty page is always the old one. Remember > that we want to save cgroup's identity in this case, and not the task. If the task moves to a new cgroup, the dirty page generated from the old group still uses the old id. When these dirty pages is writing back to disk, the corresponding bios will be delayed according to old group's bandwidth limitation. Am i right? I think we should use the new bandwidth limitation when actual IO happens. So we need to use new id for these pages. But i think the implementation for this functionality must be very complicated. :) > > Thanks, > -Andrea > > > -- Regards Gui Jianfeng -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/