Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753357AbZDPCsB (ORCPT ); Wed, 15 Apr 2009 22:48:01 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751830AbZDPCrw (ORCPT ); Wed, 15 Apr 2009 22:47:52 -0400 Received: from fms-01.valinux.co.jp ([210.128.90.1]:33106 "EHLO mail.valinux.co.jp" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751014AbZDPCrv (ORCPT ); Wed, 15 Apr 2009 22:47:51 -0400 Date: Thu, 16 Apr 2009 11:47:50 +0900 (JST) Message-Id: <20090416.114750.226794985.ryov@valinux.co.jp> To: vgoyal@redhat.com Cc: fernando@oss.ntt.co.jp, linux-kernel@vger.kernel.org, jmoyer@redhat.com, dm-devel@redhat.com, jens.axboe@oracle.com, nauman@google.com, agk@redhat.com, balbir@linux.vnet.ibm.com Subject: Re: [dm-devel] Re: dm-ioband: Test results. From: Ryo Tsuruta In-Reply-To: <20090415.223832.71125857.ryov@valinux.co.jp> References: <20090413.130552.226792299.ryov@valinux.co.jp> <20090415043759.GA8349@redhat.com> <20090415.223832.71125857.ryov@valinux.co.jp> X-Mailer: Mew version 5.2.52 on Emacs 22.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5001 Lines: 108 Hi Vivek, > General thoughts about dm-ioband > ================================ > - Implementing control at second level has the advantage tha one does not > have to muck with IO scheduler code. But then it also has the > disadvantage that there is no communication with IO scheduler. > > - dm-ioband is buffering bio at higher layer and then doing FIFO release > of these bios. This FIFO release can lead to priority inversion problems > in certain cases where RT requests are way behind BE requests or > reader starvation where reader bios are getting hidden behind writer > bios etc. These are hard to notice issues in user space. I guess above > RT results do highlight the RT task problems. I am still working on > other test cases and see if i can show the probelm. > > - dm-ioband does this extra grouping logic using dm messages. Why > cgroup infrastructure is not sufficient to meet your needs like > grouping tasks based on uid etc? I think we should get rid of all > the extra grouping logic and just use cgroup for grouping information. I want to use dm-ioband even without cgroup and to make dm-ioband has flexibility to support various type of objects. > - Why do we need to specify bio cgroup ids to the dm-ioband externally with > the help of dm messages? A user should be able to just create the > cgroups, put the tasks in right cgroup and then everything should > just work fine. This is because to handle cgroup on dm-ioband easily and it keeps the code simple. > - Why do we have to put another dm-ioband device on top of every partition > or existing device mapper device to control it? Is it possible to do > this control on make_request function of the reuqest queue so that > we don't end up creating additional dm devices? I had posted the crude > RFC patch as proof of concept but did not continue the development > because of fundamental issue of FIFO release of buffered bios. > > http://lkml.org/lkml/2008/11/6/227 > > Can you please have a look and provide feedback about why we can not > go in the direction of the above patches and why do we need to create > additional dm device. > > I think in current form, dm-ioband is hard to configure and we should > look for ways simplify configuration. This can be solved by using a tool or a small script. > - I personally think that even group IO scheduling should be done at > IO scheduler level and we should not break down IO scheduling in two > parts where group scheduling is done by higher level IO scheduler > sitting in dm layer and io scheduling among tasks with-in groups is > done by actual IO scheduler. > > But this also means more work as one has to muck around with core IO > scheduler's to make them cgroup aware and also make sure existing > functionality is not broken. I posted the patches here. > > http://lkml.org/lkml/2009/3/11/486 > > Can you please let us know that why does IO scheduler based approach > does not work for you? I think your approach is not bad, but I've made it my purpose to control disk bandwidth of virtual machines by device-mapper and dm-ioband. I think device-mapper is a well designed system for the following reasons: - It can easily add new functions to a block device. - No need to muck around with the existing kernel code. - dm-devices are detachable. It doesn't make any effects on the system if a user doesn't use it. So I think dm-ioband and your IO controller can coexist. What do you think about it? > Jens, it would be nice to hear your opinion about two level vs one > level conrol. Do you think that common layer approach is the way > to go where one can control things more tightly or FIFO release of bios > from second level controller is fine and we can live with this additional serialization in the layer above just above IO scheduler? > > - There is no notion of RT cgroups. So even if one wants to run an RT > task in root cgroup to make sure to get full access of disk, it can't > do that. It has to share the BW with other competing groups. > > - dm-ioband controls amount of IO done per second. Will a seeky process > not run away more disk time? Could you elaborate on this? dm-ioband doesn't control it per second. > Additionally, at group level we will provide fairness in terms of amount > of IO (number of blocks transferred etc) and with-in group cfq will try > to provide fairness in terms of disk access time slices. I don't even > know whether it is a matter of concern or not. I was thinking that > probably one uniform policy on the hierarchical scheduling tree would > have probably been better. Just thinking loud..... > > Thanks > Vivek Thanks, Ryo Tsuruta -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/