Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759334AbZAWKOU (ORCPT ); Fri, 23 Jan 2009 05:14:20 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1756325AbZAWKOH (ORCPT ); Fri, 23 Jan 2009 05:14:07 -0500 Received: from fms-01.valinux.co.jp ([210.128.90.1]:54081 "EHLO mail.valinux.co.jp" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754561AbZAWKOF (ORCPT ); Fri, 23 Jan 2009 05:14:05 -0500 Date: Fri, 23 Jan 2009 19:14:04 +0900 (JST) Message-Id: <20090123.191404.39168431.ryov@valinux.co.jp> To: vgoyal@redhat.com Cc: dm-devel@redhat.com, agk@redhat.com, linux-kernel@vger.kernel.org, containers@lists.linux-foundation.org, nauman@google.com, dpshah@google.com, lizf@cn.fujitsu.com, mikew@google.com, fchecconi@gmail.com, paolo.valente@unimore.it, jens.axboe@oracle.com, fernando@intellilink.co.jp, s-uchida@ap.jp.nec.com, taka@valinux.co.jp, guijianfeng@cn.fujitsu.com, arozansk@redhat.com, jmoyer@redhat.com, riel@redhat.com, peterz@infradead.org, menage@google.com, balbir@linux.vnet.ibm.com, dhaval@linux.vnet.ibm.com, chrisw@redhat.com Subject: Re: [dm-devel] [PATCH 1/2] dm-ioband: I/O bandwidth controller v1.10.0: Source code and patch From: Ryo Tsuruta In-Reply-To: <20090122161218.GA28795@redhat.com> References: <20090120.141114.226778416.ryov@valinux.co.jp> <20090120155334.GH9859@agk.fab.redhat.com> <20090122161218.GA28795@redhat.com> X-Mailer: Mew version 5.2.52 on Emacs 22.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5122 Lines: 113 Hi Vivek, Thanks for your comments. > I am not very sure why dm-ioband folks want to enable IO control on any > xyz block device but in the past I got two responses. > > 1. Need to control end devices which don't have any elevator attached. > 2. Need to do IO control for devices which are effectively network backed. > for example, an NFS mounted file loop mounted as a block device. The two responses are issues of IO scheduler based controllers, not reasons why we implement the IO controller as a device mapper driver. The reasons of that are: - A user have a choice whether to use dm-ioband or not, and dm-ioband doesn't make any effects on the system if a user doesn't want to use it. - The dm device is highly independent module, so we don't need to modify the existing kernel code including the IO schedulers. It can keep the IO scheduler implementation simple. So, dm-ioband can co-exist with any other IO controllers from a user's and kernel developer's perspective. > Why generic IO controller is not good for every case > ==================================================== > To my knowledge, there have been two generic controller implementations. > One is dm-ioband and other is an RFC patch by me. Following is the link. > > http://lkml.org/lkml/2008/11/6/227 > > The biggest issue with generic controller is that they can buffer the > bio's at higher layer (once a cgroup is backed up) and then later release > those bios in FIFO manner. This can conflict with unerlying IO scheduler's > assumptions. Following example comes to mind. I don't think you are completely right. > - If there is one task of io priority 0 in a cgroup and rest of the tasks > are of io prio 7. All the tasks belong to best effort class. If tasks of > lower priority (7) do lot of IO, then due to buffering there is a chance > that IO from lower prio tasks is seen by CFQ first and io from higher prio > task is not seen by cfq for quite some time hence that task not getting it > fair share with in the cgroup. Similar situation can arise with RT tasks > also. Whether using dm-ioband or not, if the tasks of IO priority 7 do lot of IO, then the device queue is going to be full and tasks which tries to issue IOs are blocked until the queue get a slot. The IOs are backlogged even if they are issued from the task of IO priority 0. I don't understand why you think it's the biggest issue. The same thing is going to happen without dm-ioband. If I were you, I create two cgroups and let tasks of lower priority belong to one cgroup and tasks of higher priority belong to another, and give higher bandwidth to the cgroup to which the higher priority tasks belong. What do you think about this way? > - Task grouping logic > - We already have the notion of cgroup where tasks can be grouped > in hierarhical manner. dm-ioband does not make full use of that and > comes up with own mechansim of grouping tasks (apart from cgroup). > And there are odd ways of specifying cgroup id which configuring the > dm-ioband device. I think once somebody has created the cgroup > hieararchy, any IO controller logic should be able to internally > read that hiearchy and provide control. There should not be need > of any other configuration utity on top of cgroup. > > My RFC patches had done that. Dm-ioband can work with the bio-cgroup mechanism, which makes task groups in manner of the cgroup, of course. I already have a basic design to make dm-ioband support the cgroup hierarchy. This should be started after the core code of bio-cgroup, which helps trace each I/O requests, is merged in -mm tree. And the reason dm-ioband uses cgroup id to specify a cgroup is that the current cgroup infrastructure lacks features to manage resources placed in the kernel modules. > - Need of a dm device for every device we want to control > > - This requirement looks odd. It forces everybody to use dm-tools > and if there are lots of disks in the system, configuation is > pain. I don't think it's so pain. I think you are already using LVM devices on your boxes. Setting up dm-ioband is the same as that for LVM. And some scripts or something similar will help you set up them. And it is also possible this algorithm can be directly implemented in the block layer if this is really needed. > - Does it support hiearhical grouping? > > - I have not looked very closely at dm-ioband patches about this and > had asked ryo a question about this (no response). > > Ryo does, dm-ioband support hierarhical grouping configuration? I'm sorry I missed your email with the question. I already have a design plan for it and I will start to implement it if there are a lot of requests for this. But I doubt this should be implemented in kernel, which can be placed in user-land, such as a daemon program. Thanks, Ryo Tsuruta -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/