Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753506AbZIIRaz (ORCPT ); Wed, 9 Sep 2009 13:30:55 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753501AbZIIRay (ORCPT ); Wed, 9 Sep 2009 13:30:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:39938 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752416AbZIIRax (ORCPT ); Wed, 9 Sep 2009 13:30:53 -0400 Date: Wed, 9 Sep 2009 13:30:03 -0400 From: Vivek Goyal To: Fabio Checconi Cc: Rik van Riel , Ryo Tsuruta , linux-kernel@vger.kernel.org, dm-devel@redhat.com, jens.axboe@oracle.com, agk@redhat.com, akpm@linux-foundation.org, nauman@google.com, guijianfeng@cn.fujitsu.com, jmoyer@redhat.com, balbir@linux.vnet.ibm.com Subject: Re: Regarding dm-ioband tests Message-ID: <20090909173003.GE8256@redhat.com> References: <20090904231129.GA3689@redhat.com> <20090907.200222.193693062.ryov@valinux.co.jp> <4AA51065.6050000@redhat.com> <20090908.120119.71095369.ryov@valinux.co.jp> <4AA6AF58.3050501@redhat.com> <20090909000900.GK17468@gandalf.sssup.it> <20090909020620.GC3594@redhat.com> <20090909154126.GG17468@gandalf.sssup.it> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090909154126.GG17468@gandalf.sssup.it> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4437 Lines: 99 On Wed, Sep 09, 2009 at 05:41:26PM +0200, Fabio Checconi wrote: > > From: Vivek Goyal > > Date: Tue, Sep 08, 2009 10:06:20PM -0400 > > > > On Wed, Sep 09, 2009 at 02:09:00AM +0200, Fabio Checconi wrote: > > > Hi, > > > > > > > From: Rik van Riel > > > > Date: Tue, Sep 08, 2009 03:24:08PM -0400 > > > > > > > > Ryo Tsuruta wrote: > > > > >Rik van Riel wrote: > > > > > > > > >>Are you saying that dm-ioband is purposely unfair, > > > > >>until a certain load level is reached? > > > > > > > > > >Not unfair, dm-ioband(weight policy) is intentionally designed to > > > > >use bandwidth efficiently, weight policy tries to give spare bandwidth > > > > >of inactive groups to active groups. > > > > > > > > This sounds good, except that the lack of anticipation > > > > means that a group with just one task doing reads will > > > > be considered "inactive" in-between reads. > > > > > > > > > > anticipation helps in achieving fairness, but CFQ currently disables > > > idling for nonrot+NCQ media, to avoid the resulting throughput loss on > > > some SSDs. Are we really sure that we want to introduce anticipation > > > everywhere, not only to improve throughput on rotational media, but to > > > achieve fairness too? > > > > That's a good point. Personally I think that fairness requirements for > > individual queues and groups are little different. CFQ in general seems > > to be focussing more on latency and throughput at the cost of fairness. > > > > With groups, we probably need to put a greater amount of emphasis on group > > fairness. So group will be a relatively a slower entity (with anticiaption > > on and more idling), but it will also give you a greater amount of > > isolation. So in practice, one will create groups carefully and they will > > not proliferate like queues. This can mean overall reduced throughput on > > SSD. > > > > Ok, I personally agree on that, but I think it's something to be documented. > Sure. I will document it in documentation file. > > > Having said that, group idling is tunable and one can always reduce it to > > achieve a balance between fairness vs throughput depending on his need. > > > > This is good, however tuning will not be an easy task (at least, in my > experience with BFQ it has been a problem): while for throughput usually > there are tradeoffs, as soon as a queue/group idles and then timeouts, > from the fairness perspective the results soon become almost random > (i.e., depending on the rate of successful anticipations, but in the > common case they are unpredictable)... I am lost in last few lines. I guess you are suggesting that static tuning is hard and dynamically adjusting idling has limitations that it might not be accurate all the time? I will explain how things are working in current set of io scheduler patches. Currently on top of queue idling, I have implemented group idling also. Queue idling is dynamic and io scheduler like CFQ keeps track of traffic pattern on the queue and disables/enables idling dynamically. So in this case fairness depends on rate of successful anticipations by the io scheduler. Group idling currently is static in nature and purely implemented in elevator fair queuing layer. Group idling kicks in only when a group is empty at the time of queue expiration and underlying ioscheduler has not chosen to enable idling on the queue. This provides us the gurantee that group will keep on getting its fair share of disk as long as a new request comes in the group with-in that idling period. Implementing group idling ensures that it does not bog down the io scheduler and with-in group queue switching can still be very fast (no idling on many of the queues by cfq). Now in case of SSD if group idling is really hurting somebody, I would expect him to set it to either 1 or 0. You might get better throughput but then expect fairness for the group only if the group is continuously backlogged. (Something what dm-ioband guys seem to be doing). So do you think that adjusting this "group_idling" tunable is too complicated and there are better ways to handle it in case of SSD+NCQ? Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/