Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753155AbZJEO4O (ORCPT ); Mon, 5 Oct 2009 10:56:14 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752884AbZJEO4O (ORCPT ); Mon, 5 Oct 2009 10:56:14 -0400 Received: from mail.valinux.co.jp ([210.128.90.3]:33936 "EHLO mail.valinux.co.jp" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752387AbZJEO4N (ORCPT ); Mon, 5 Oct 2009 10:56:13 -0400 Date: Mon, 05 Oct 2009 23:55:35 +0900 (JST) Message-Id: <20091005.235535.193690928.ryov@valinux.co.jp> To: vgoyal@redhat.com Cc: m-ikeda@ds.jp.nec.com, nauman@google.com, linux-kernel@vger.kernel.org, jens.axboe@oracle.com, containers@lists.linux-foundation.org, dm-devel@redhat.com, dpshah@google.com, lizf@cn.fujitsu.com, mikew@google.com, fchecconi@gmail.com, paolo.valente@unimore.it, fernando@oss.ntt.co.jp, s-uchida@ap.jp.nec.com, taka@valinux.co.jp, guijianfeng@cn.fujitsu.com, jmoyer@redhat.com, dhaval@linux.vnet.ibm.com, balbir@linux.vnet.ibm.com, righi.andrea@gmail.com, agk@redhat.com, akpm@linux-foundation.org, peterz@infradead.org, jmarchan@redhat.com, torvalds@linux-foundation.org, mingo@elte.hu, riel@redhat.com, yoshikawa.takuya@oss.ntt.co.jp Subject: Re: IO scheduler based IO controller V10 From: Ryo Tsuruta In-Reply-To: <20091005123148.GB22143@redhat.com> References: <4AC6623F.70600@ds.jp.nec.com> <20091005.193808.104033719.ryov@valinux.co.jp> <20091005123148.GB22143@redhat.com> X-Mailer: Mew version 5.2.52 on Emacs 22.1 / Mule 5.0 (SAKAKI) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5542 Lines: 117 Hi Vivek, Vivek Goyal wrote: > On Mon, Oct 05, 2009 at 07:38:08PM +0900, Ryo Tsuruta wrote: > > Hi, > > > > Munehiro Ikeda wrote: > > > Vivek Goyal wrote, on 10/01/2009 10:57 PM: > > > > Before finishing this mail, will throw a whacky idea in the ring. I was > > > > going through the request based dm-multipath paper. Will it make sense > > > > to implement request based dm-ioband? So basically we implement all the > > > > group scheduling in CFQ and let dm-ioband implement a request function > > > > to take the request and break it back into bios. This way we can keep > > > > all the group control at one place and also meet most of the requirements. > > > > > > > > So request based dm-ioband will have a request in hand once that request > > > > has passed group control and prio control. Because dm-ioband is a device > > > > mapper target, one can put it on higher level devices (practically taking > > > > CFQ at higher level device), and provide fairness there. One can also > > > > put it on those SSDs which don't use IO scheduler (this is kind of forcing > > > > them to use the IO scheduler.) > > > > > > > > I am sure that will be many issues but one big issue I could think of that > > > > CFQ thinks that there is one device beneath it and dipsatches requests > > > > from one queue (in case of idling) and that would kill parallelism at > > > > higher layer and throughput will suffer on many of the dm/md configurations. > > > > > > > > Thanks > > > > Vivek > > > > > > As long as using CFQ, your idea is reasonable for me. But how about for > > > other IO schedulers? In my understanding, one of the keys to guarantee > > > group isolation in your patch is to have per-group IO scheduler internal > > > queue even with as, deadline, and noop scheduler. I think this is > > > great idea, and to implement generic code for all IO schedulers was > > > concluded when we had so many IO scheduler specific proposals. > > > If we will still need per-group IO scheduler internal queues with > > > request-based dm-ioband, we have to modify elevator layer. It seems > > > out of scope of dm. > > > I might miss something... > > > > IIUC, the request based device-mapper could not break back a request > > into bio, so it could not work with block devices which don't use the > > IO scheduler. > > > > I think current request based multipath drvier does not do it but can't it > be implemented that requests are broken back into bio? I guess it would be hard to implement it, and we need to hold requests and throttle them at there and it would break the ordering by CFQ. > Anyway, I don't feel too strongly about this approach as it might > introduce more serialization at higher layer. Yes, I know it. > > How about adding a callback function to the higher level controller? > > CFQ calls it when the active queue runs out of time, then the higer > > level controller use it as a trigger or a hint to move IO group, so > > I think a time-based controller could be implemented at higher level. > > > > Adding a call back should not be a big issue. But that means you are > planning to run only one group at higher layer at one time and I think > that's the problem because than we are introducing serialization at higher > layer. So any higher level device mapper target which has multiple > physical disks under it, we might be underutilizing these even more and > take a big hit on overall throughput. > > The whole design of doing proportional weight at lower layer is optimial > usage of system. But I think that the higher level approch makes easy to configure against striped software raid devices. If one would like to combine some physical disks into one logical device like a dm-linear, I think one should map the IO controller on each physical device and combine them into one logical device. > > My requirements for IO controller are: > > - Implement s a higher level controller, which is located at block > > layer and bio is grabbed in generic_make_request(). > > How are you planning to handle the issue of buffered writes Andrew raised? I think that it would be better to use the higher-level controller along with the memory controller and have limits memory usage for each cgroup. And as Kamezawa-san said, having limits of dirty pages would be better, too. > > - Can work with any type of IO scheduler. > > - Can work with any type of block devices. > > - Support multiple policies, proportional wegiht, max rate, time > > based, ans so on. > > > > The IO controller mini-summit will be held in next week, and I'm > > looking forard to meet you all and discuss about IO controller. > > https://sourceforge.net/apps/trac/ioband/wiki/iosummit > > Is there a new version of dm-ioband now where you have solved the issue of > sync/async dispatch with-in group? Before meeting at mini-summit, I am > trying to run some tests and come up with numbers so that we have more > clear picture of pros/cons. Yes, I've released new versions of dm-ioband and blkio-cgroup. The new dm-ioband handles sync/async IO requests separately and the write-starve-read issue you pointed out is fixed. I would appreciate it if you would try them. http://sourceforge.net/projects/ioband/files/ Thanks, Ryo Tsuruta -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/