Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758234Ab1CCPo1 (ORCPT ); Thu, 3 Mar 2011 10:44:27 -0500 Received: from 0122700014.0.fullrate.dk ([95.166.99.235]:58192 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757489Ab1CCPo0 (ORCPT ); Thu, 3 Mar 2011 10:44:26 -0500 Message-ID: <4D6FB751.3010608@kernel.dk> Date: Thu, 03 Mar 2011 10:44:17 -0500 From: Jens Axboe MIME-Version: 1.0 To: Vivek Goyal CC: Justin TerAvest , Chad Talbott , Nauman Rafique , Divyesh Shah , lkml , Gui Jianfeng , Corrado Zoccolo , KAMEZAWA Hiroyuki , Greg Thelen Subject: Re: Per iocontext request descriptor limits (Was: Re: RFC: default group_isolation to 1, remove option) References: <20110301142002.GB25699@redhat.com> <4D6F0ED0.80804@kernel.dk> <20110303153007.GF16720@redhat.com> In-Reply-To: <20110303153007.GF16720@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4006 Lines: 85 On 2011-03-03 10:30, Vivek Goyal wrote: > On Wed, Mar 02, 2011 at 10:45:20PM -0500, Jens Axboe wrote: >> On 2011-03-01 09:20, Vivek Goyal wrote: >>> I think creating per group request pool will complicate the >>> implementation further. (we have done that once in the past). Jens >>> once mentioned that he liked number of requests per iocontext limit >>> better than overall queue limit. So if we implement per iocontext >>> limit, it will get rid of need of doing anything extra for group >>> infrastructure. >>> >>> Jens, do you think per iocontext per queue limit on request >>> descriptors make sense and we can get rid of per queue overall limit? >> >> Since we practically don't need a limit anymore to begin with (or so is >> the theory). > > So what has changed that we don't need queue limits on nr_requests anymore? > If we get rid of queue limits then we need to get rid of bdi congestion > logic also and come up with some kind of ioc congestion logic so that > a thread which does not want to sleep while submitting the request needs to > checks it own ioc for being congested or not for a specific device/bdi. Right now congestion is a measure of request starvation on the OS side. It may make sense to keep the notion of a congested device when we are operating at the device limits. But as a blocking measure it should go away. No recent change is causing us to be able to throw away the limit. It used to be that the vm got really unhappy with long queues, since you could have tons of memory dirty. This works a LOT better now. And one would hope that it does, since there are a number of drivers that don't have limts. So when I say "practically" don't need limits anymore, the hope is that we'll behave well enough with just per-ioc limits in place. >> then yes we can move to per-ioc limits instead and get rid >> of that queue state. We'd have to hold on to the ioc for the duration of >> the IO explicitly from the request then. > > I think every request submitted on request queue already takes a reference > on ioc (set_request) and reference is not dropped till completion. So > ioc is anyway around till request completes. That's only true for CFQ, it's not a block layer property. This would have to be explicitly done. >> I primarily like that implementation since it means we can make the IO >> completion lockless, at least on the block layer side. We still have >> state to complete in the schedulers that require that, but it's a good >> step at least. > > Ok so in completion path the contention will move from queue_lock to > ioc lock or something like that. (We hope that there are no other > dependencies on queue here, devil lies in details :-)) Right, so it's spread out and in most cases the ioc will be completely uncontended since it's usually private to the process. > The other potential issue with this approach is how will we handle the > case of flusher thread submitting IO. At some point of time we want to > account it to right cgroup. > > Retrieving iocontext from bio will be hard as it will atleast require > on extra pointer in page_cgroup and I am not sure how feasible that is. > > Or we could come up with the concept of group iocontext. With the help > of page cgroup we should be able to get to cgroup, retrieve the right > group iocontext and check the limit against that. But I guess this > get complicated. > > So if we move to ioc based limit, then for async IO, a reasonable way > would be to find the io context of submitting task and operate on that > even if that means increased page_cgroup size. For now it's not a complicated effort, I already have a patch for this. If page tracking needs extra complexity, it'll have to remain in the page tracking code. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/