Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755148AbZKDTBV (ORCPT ); Wed, 4 Nov 2009 14:01:21 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753222AbZKDTBU (ORCPT ); Wed, 4 Nov 2009 14:01:20 -0500 Received: from mx1.redhat.com ([209.132.183.28]:27739 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751667AbZKDTBT (ORCPT ); Wed, 4 Nov 2009 14:01:19 -0500 Date: Wed, 4 Nov 2009 14:00:33 -0500 From: Vivek Goyal To: Divyesh Shah Cc: Jeff Moyer , linux-kernel@vger.kernel.org, jens.axboe@oracle.com, nauman@google.com, lizf@cn.fujitsu.com, ryov@valinux.co.jp, fernando@oss.ntt.co.jp, s-uchida@ap.jp.nec.com, taka@valinux.co.jp, guijianfeng@cn.fujitsu.com, balbir@linux.vnet.ibm.com, righi.andrea@gmail.com, m-ikeda@ds.jp.nec.com, akpm@linux-foundation.org, riel@redhat.com, kamezawa.hiroyu@jp.fujitsu.com Subject: Re: [PATCH 03/20] blkio: Introduce the notion of weights Message-ID: <20091104190033.GG2870@redhat.com> References: <1257291837-6246-1-git-send-email-vgoyal@redhat.com> <1257291837-6246-4-git-send-email-vgoyal@redhat.com> <20091104154135.GA2870@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.5.19 (2009-01-05) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4857 Lines: 105 On Wed, Nov 04, 2009 at 09:07:41AM -0800, Divyesh Shah wrote: > On Wed, Nov 4, 2009 at 7:41 AM, Vivek Goyal wrote: > > > > On Wed, Nov 04, 2009 at 10:06:16AM -0500, Jeff Moyer wrote: > > > Vivek Goyal writes: > > > > > > > o Introduce the notion of weights. Priorities are mapped to weights internally. > > > > ? These weights will be useful once IO groups are introduced and group's share > > > > ? will be decided by the group weight. > > > > > > I'm sorry, but I need more background to review this patch. ?Where do > > > the min and max come from? ?Why do you scale 7-0 from 200-900? ?How does > > > this map to what was there before (exactly, approximately)? > > > > > > > Well, So far we only have the notion of iopriority for the process and > > based on that we determine time slice length. > > > > Soon we will throw cfq groups also in the mix. Because cpu IO controller > > is weight driven, people have shown preference that group's share should > > be decided based on its weight and not introduce the notion of ioprio for > > groups. > > > > So now core scheduling algorithm only recognizes weights for entities (be it > > cfq queues or cfq groups), and it is required that we convert the ioprio > > of cfqq into weight. > > > > Now it is a matter of coming up with what weight range do we support and > > how ioprio should be mapped onto these weights. We can always change the > > mappings but to being with, I have followed following. > > > > Allow a weight range from 100 to 1000. Allowing too small a weights like > > "1", can lead to very interesting corner cases and I wanted to avoid that > > in first implementation. For example, if some group with weight "1" gets > > a time slice of 100ms, its vtime will be really high and after that it > > will not get scheduled in for a very long time. > > > > Seconly allowing too small a weights can make vtime of the tree move very > > fast with faster wrap around of min_vdistime. (especially on SSD where idling > > might not be enabled, and for every queue expiry we will attribute minimum of > > 1ms of slice. If weight of the group is "1" it will higher vtime and > > min_vdisktime will move very fast). We don't want too fast a wrap around > > of min_vdisktime (especially in case of idle tree. That infrastructure is > > not part of current patches). > > > > Hence, to begin with I wanted to limit the range of weights allowed because > > wider range opens up lot of interesting corner cases. That's why limited > > minimum weight to 100. So at max user can expect the 1000/100=10 times service > > differentiation between highest and lower weight groups. If folks need more > > than that, we can look into it once things stablize. > > We definitely need the 1:100 differentiation. I'm ok with adding that > later after the core set of patches stabilize but just letting you > know that it is important to us. Good to know. I will begin with max service difference of 10 times and once things stablize, will go enable wider range of weights. > Also curious why you chose a higher > range 100-1000 instead of 10-100? For smaller vtime leaps? Good question. Initially we had thought that range of 1-1000 should be good enough. Later decided to cap minimum weight to 100. But same can be achieved by smaller range of 1-100 and capping minimum weight at 10. This will make vtime leap forward slower also. Later if somebody needs ratio higher than 1:100, we can think of supporting even wider weight range. Thanks Divyesh for the idea. I think I will change weight range to 10-100 and map ioprio 0-7 on weights 20 to 90. Thanks Vivek > > > > > Priority and weights follow reverse order. Higher priority means low > > weight and vice-versa. > > > > Currently we support 8 priority levels and prio "4" is the middle point. > > Anything higher than prio 4 gets 20% less slice as compared to prio 4 and > > priorities lower than 4, get 20% higher slice of prio 4 (20% higher/lower > > for each priority level). > > > > For weight range 100 - 1000, 500 can be considered as mid point. Now this > > is how priority mapping looks like. > > > > ? ? ? ?100 200 300 400 500 600 700 800 900 1000 ?(Weights) > > ? ? ? ? ? ? 7 ? 6 ? 5 ? 4 ? 3 ? 2 ?1 ? 0 ? ? ? ? (io prio). > > > > Once priorities are converted to weights, we are able to retain the notion > > of 20% difference between prio levels by choosing 500 as the mid point and > > mapping prio 0-7 to weights 900-200, hence this mapping. > > > > I am all ears if you have any suggestions on how this ca be handled > > better. > > > > Thanks > > Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/