Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753920AbYKMNrc (ORCPT ); Thu, 13 Nov 2008 08:47:32 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752297AbYKMNrX (ORCPT ); Thu, 13 Nov 2008 08:47:23 -0500 Received: from ms01.sssup.it ([193.205.80.99]:36690 "EHLO sssup.it" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752042AbYKMNrX (ORCPT ); Thu, 13 Nov 2008 08:47:23 -0500 Date: Thu, 13 Nov 2008 14:49:49 +0100 From: Fabio Checconi To: Nauman Rafique Cc: Vivek Goyal , Peter Zijlstra , linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, virtualization@lists.linux-foundation.org, jens.axboe@oracle.com, Hirokazu Takahashi , Ryo Tsuruta , Andrea Righi , Satoshi UCHIDA , fernando@oss.ntt.co.jp, balbir@linux.vnet.ibm.com, Andrew Morton , menage@google.com, ngupta@google.com, Rik van Riel , Jeff Moyer , "dpshah@google.com" , Mike Waychison , rohitseth@google.com, paolo.valente@unimore.it Subject: Re: [patch 0/4] [RFC] Another proportional weight IO controller Message-ID: <20081113134949.GH14817@gandalf.sssup.it> References: <20081106163957.GB7461@redhat.com> <1225990327.7803.4776.camel@twins> <20081106170830.GD7461@redhat.com> <20081107141943.GC21884@redhat.com> <20081110141143.GC26956@redhat.com> <20081111223024.GA31527@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2670 Lines: 53 Hi, > From: Nauman Rafique > Date: Wed, Nov 12, 2008 01:20:13PM -0800 > ... > >> CFQ can be trivially > >> modified to do proportional division (i.e give time slices in > >> proportion to weight instead of priority). > >> And such a solution would > >> avoid idleness problem like the one you mentioned above. > > > > Can you just elaborate a little on how do you get around idleness problem? > > If you don't create idleness than if two tasks in two cgroups are doing > > sequential IO, they might simply get into lockstep and we will not achieve > > any differentiated service proportionate to their weight. > > I was thinking of a more cfq-like solution for proportional division > at the elevator level (i.e. not a token based solution). There are two > options for proportional bandwidth division at elevator level: 1) > change the size of the time slice in proportion to the weights or 2) > allocate equal time slice each time but allocate more slices to cgroup > with more weight. For (2), we can actually keep track of time taken to > serve requests and allocate time slices in such a way that the actual > disk time is proportional to the weight. We can adopt a fair-queuing > (http://lkml.org/lkml/2008/4/1/234) like approach for this if we want > to go that way. > > I am not sure if the solutions mentioned above will have the lockstep > problem you mentioned above or not. Since we are allocating time > slices, and would have anticipation built in (just like cfq), we would > have some level of idleness. But this idleness can be predicted based > on a thread behavior. if I understand that correctly, the problem may arise whenever you have to deal with *synchronous* I/O, where you may not see the streams of requests generated by tasks as continuously backlogged (and the algorithm used to distribute bandwidth does the implicit assumption that they are, as in the cfq case). A cfq-like solution with idling enabled AFAIK should not suffer from this problem, as it creates backlog for the process being anticipated. But anticipation is not always used, and cfq currently disables it for SSDs and in other cases where it may hurt performance (e.g., NCQ drives in presence of seeky loads, etc). So, in these cases, something still needs to be done if we want a proportional bandwidth distribution, and we don't want to pay the extra cost of idling when it's not strictly necessary. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/