Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754736AbZDQCOy (ORCPT ); Thu, 16 Apr 2009 22:14:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752853AbZDQCOq (ORCPT ); Thu, 16 Apr 2009 22:14:46 -0400 Received: from mx2.redhat.com ([66.187.237.31]:57278 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752543AbZDQCOp (ORCPT ); Thu, 16 Apr 2009 22:14:45 -0400 Date: Thu, 16 Apr 2009 22:11:39 -0400 From: Vivek Goyal To: Ryo Tsuruta Cc: agk@redhat.com, dm-devel@redhat.com, linux-kernel@vger.kernel.org, Nauman Rafique , Fernando Luis =?iso-8859-1?Q?V=E1zquez?= Cao , Andrea Righi , Jens Axboe , Balbir Singh , Moyer Jeff Moyer , Morton Andrew Morton Subject: Re: dm-ioband: Test results. Message-ID: <20090417021139.GB23152@redhat.com> References: <20090413.130552.226792299.ryov@valinux.co.jp> <20090416205720.GI8896@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090416205720.GI8896@redhat.com> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6438 Lines: 174 On Thu, Apr 16, 2009 at 04:57:20PM -0400, Vivek Goyal wrote: > On Mon, Apr 13, 2009 at 01:05:52PM +0900, Ryo Tsuruta wrote: > > Hi Alasdair and all, > > > > I did more tests on dm-ioband and I've posted the test items and > > results on my website. The results are very good. > > http://people.valinux.co.jp/~ryov/dm-ioband/test/test-items.xls > > > > I hope someone will test dm-ioband and report back to the dm-devel > > mailing list. > > > Ok, one more test. This time to show that with single queue and FIFO dispatch a writer can easily starve the reader. I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights 40 and 20. I am launching an aggressive writer dd with prio 7 (Best effort) and a reader with prio 0 (Best effort). Following is my script. **************************************************************** rm /mnt/sdd1/aggressivewriter sync echo 3 > /proc/sys/vm/drop_caches #launch an hostile writer ionice -c2 -n7 dd if=/dev/zero of=/mnt/sdd1/aggressivewriter bs=4K count=524288 conv=fdatasync & # Reader ionice -c 2 -n 0 dd if=/mnt/sdd1/testzerofile1 of=/dev/null & wait $! echo "reader finished" ********************************************************************** Following are the results without and with dm-ioband Without dm-ioband ----------------- First run 2147483648 bytes (2.1 GB) copied, 46.4747 s, 46.2 MB/s (Reader) reader finished 2147483648 bytes (2.1 GB) copied, 87.9293 s, 24.4 MB/s (Writer) Second run 2147483648 bytes (2.1 GB) copied, 47.6461 s, 45.1 MB/s (Reader) reader finished 2147483648 bytes (2.1 GB) copied, 89.0781 s, 24.1 MB/s (Writer) Third run 2147483648 bytes (2.1 GB) copied, 51.0624 s, 42.1 MB/s (Reader) reader finished 2147483648 bytes (2.1 GB) copied, 91.9507 s, 23.4 MB/s (Writer) With dm-ioband -------------- 2147483648 bytes (2.1 GB) copied, 54.895 s, 39.1 MB/s (Writer) 2147483648 bytes (2.1 GB) copied, 88.6323 s, 24.2 MB/s (Reader) reader finished 2147483648 bytes (2.1 GB) copied, 62.6102 s, 34.3 MB/s (Writer) 2147483648 bytes (2.1 GB) copied, 91.6662 s, 23.4 MB/s (Reader) reader finished 2147483648 bytes (2.1 GB) copied, 58.9928 s, 36.4 MB/s (Writer) 2147483648 bytes (2.1 GB) copied, 90.6707 s, 23.7 MB/s (Reader) reader finished I have marked which dd finished first. I determine it with the help of wait command and also monitor the "iostat -d 5 sdd1" to see how IO rates are varying. Notice with dm-ioband its complete reversal of fortunes. A reader is completely starved by the aggressive writer. I think this one you should be able to reproduce easily with script. I don't understand how single queue and FIFO dispatch does not break the notion of cfq classes and priorities? Thanks Vivek > Ok, here are more test results. This time I am trying to see how fairness > is provided for async writes and how does it impact throughput. > > I have created two partitions /dev/sda1 and /dev/sda2. Two ioband devices > ioband1 and ioband2 on /dev/sda1 and /dev/sda2 respectively with weights > 40 and 40. > > #dmsetup status > ioband2: 0 38025855 ioband 1 -1 150 8 186 1 0 8 > ioband1: 0 40098177 ioband 1 -1 150 8 186 1 0 8 > > I ran following two fio jobs. One job in each partition. > > ************************************************************ > echo cfq > /sys/block/sdd/queue/scheduler > sync > echo 3 > /proc/sys/vm/drop_caches > > fio_args="--size=64m --rw=write --numjobs=50 --group_reporting" > time fio $fio_args --name=test1 --directory=/mnt/sdd1/fio/ > --output=test1.log & > time fio $fio_args --name=test2 --directory=/mnt/sdd2/fio/ > --output=test2.log & > wait > ***************************************************************** > > Following are fio job finish times with and without dm-ioband > > first job second job > without dm-ioband 3m29.947s 4m1.436s > with dm-ioband 8m42.532s 8m43.328s > > This sounds like 100% performance regression in this particular setup. > > I think this regression is introduced because we are waiting for too > long for slower group to catch up to make sure proportionate numbers > look right and choke the writes even if deviec is free. > > It is an hard to solve problem because the async writes traffic is > bursty when seen at block layer and we not necessarily see higher amount of > writer traffic dispatched from higher prio process/group. So what does one > do? Wait for other groups to catch up to show right proportionate numbers > and hence let the disk be idle and kill the performance. Or just continue > and not idle too much (a small amount of idling like 8ms for sync queue > might still be ok). > > I think there might not be much benefit in providing artificial notion > of maintaining proportionate ratio and kill the performance. We should > instead try to audit async write path and see where the higher weight > application/group is stuck. > > In my simple two dd test, I could see bursty traffic from high prio app and > then it would sometimes disappear for .2 to .8 seconds. In that duration if I > wait for higher priority group to catch up that I will end up keeping disk > idle for .8 seconds and kill performance. I guess better way is to not wait > that long (even if it means that to application it might give the impression > that io scheduler is not doing the job right in assiginig proportionate disk) > and over a period of time see if we can fix some things in async write path > for more smooth traffic to io scheduler. > > Thoughts? > > Thanks > Vivek > > > > Alasdair, could you please merge dm-ioband into upstream? Or could > > you please tell me why dm-ioband can't be merged? > > > > Thanks, > > Ryo Tsuruta > > > > To know the details of dm-ioband: > > http://people.valinux.co.jp/~ryov/dm-ioband/ > > > > RPM packages for RHEL5 and CentOS5 are available: > > http://people.valinux.co.jp/~ryov/dm-ioband/binary.html > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/