Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758676AbZIOVl2 (ORCPT ); Tue, 15 Sep 2009 17:41:28 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751213AbZIOVlY (ORCPT ); Tue, 15 Sep 2009 17:41:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45032 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750945AbZIOVlY (ORCPT ); Tue, 15 Sep 2009 17:41:24 -0400 Date: Tue, 15 Sep 2009 17:40:32 -0400 From: Vivek Goyal To: Ryo Tsuruta Cc: linux-kernel@vger.kernel.org, dm-devel@redhat.com, dhaval@linux.vnet.ibm.com, jens.axboe@oracle.com, agk@redhat.com, akpm@linux-foundation.org, nauman@google.com, guijianfeng@cn.fujitsu.com, jmoyer@redhat.com Subject: dm-ioband fairness in terms of sectors seems to be killing disk (Was: Re: Regarding dm-ioband tests) Message-ID: <20090915214032.GB3711@redhat.com> References: <20090901165011.GB3753@redhat.com> <20090901174724.GC3753@redhat.com> <20090903131146.GA12041@redhat.com> <20090904.101222.226781140.ryov@valinux.co.jp> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090904.101222.226781140.ryov@valinux.co.jp> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5390 Lines: 159 On Fri, Sep 04, 2009 at 10:12:22AM +0900, Ryo Tsuruta wrote: > Hi Vivek, > > Vivek Goyal wrote: > > On Tue, Sep 01, 2009 at 01:47:24PM -0400, Vivek Goyal wrote: > > > On Tue, Sep 01, 2009 at 12:50:11PM -0400, Vivek Goyal wrote: > > > > Hi Ryo, > > > > > > > > I decided to play a bit more with dm-ioband and started doing some > > > > testing. I am doing a simple two dd threads doing reads and don't seem > > > > to be gettting the fairness. So thought will ask you what's the issue > > > > here. Is there an issue with my testing procedure. > > > > > > > > I got one 40G SATA drive (no hardware queuing). I have created two > > > > partitions on that disk /dev/sdd1 and /dev/sdd2 and created two ioband > > > > devices ioband1 and ioband2 on partitions sdd1 and sdd2 respectively. The > > > > weights of ioband1 and ioband2 devices are 200 and 100 respectively. > > > > > > > > I am assuming that this setup will create two default groups and IO > > > > going to partition sdd1 should get double the BW of partition sdd2. > > > > > > > > But it looks like I am not gettting that behavior. Following is the output > > > > of "dmsetup table" command. This snapshot has been taken every 2 seconds > > > > while IO was going on. Column 9 seems to be containing how many sectors > > > > of IO has been done on a particular io band device and group. Looking at > > > > the snapshot, it does not look like that ioband1 default group got double > > > > the BW of ioband2 default group. > > > > > > > > Am I doing something wrong here? > > > > > > > > > > > Hi Ryo, > > > > Did you get a chance to look into it? Am I doing something wrong or it is > > an issue with dm-ioband. > > Sorry, I missed it. I'll look into it and report back to you. Hi Ryo, I am running a sequential reader in one group and few random reader and writers in second group. Both groups are of same weight. I ran fio scripts for 60 seconds and then looked at the output. In this case looks like we just kill the throughput of sequential reader and disk (because random readers/writers take over). I ran the test "with-dm-ioband", "without-dm-ioband" and "with ioscheduler based io controller". First I am pasting the results and in the end I will paste my test scripts. I have cut fio output heavily so that we does not get lost in lots of output. with-dm-ioband ============== ioband1 ------- randread: (groupid=0, jobs=4): err= 0: pid=3610 read : io=18,432KiB, bw=314KiB/s, iops=76, runt= 60076msec clat (usec): min=140, max=744K, avg=50866.75, stdev=61266.88 randwrite: (groupid=1, jobs=2): err= 0: pid=3614 write: io=920KiB, bw=15KiB/s, iops=3, runt= 60098msec clat (usec): min=203, max=14,171K, avg=522937.86, stdev=960929.44 ioband2 ------- seqread0: (groupid=0, jobs=1): err= 0: pid=3609 read : io=37,904KiB, bw=636KiB/s, iops=155, runt= 61026msec clat (usec): min=92, max=9,969K, avg=6437.89, stdev=168573.23 without dm-ioband (vanilla cfq, no grouping) ============================================ seqread0: (groupid=0, jobs=1): err= 0: pid=3969 read : io=321MiB, bw=5,598KiB/s, iops=1,366, runt= 60104msec clat (usec): min=91, max=763K, avg=729.61, stdev=17402.63 randread: (groupid=0, jobs=4): err= 0: pid=3970 read : io=15,112KiB, bw=257KiB/s, iops=62, runt= 60039msec clat (usec): min=124, max=1,066K, avg=63721.26, stdev=78215.17 randwrite: (groupid=1, jobs=2): err= 0: pid=3974 write: io=680KiB, bw=11KiB/s, iops=2, runt= 60073msec clat (usec): min=199, max=24,646K, avg=706719.51, stdev=1774887.55 With ioscheduer based io controller patches =========================================== cgroup 1 (weight 100) --------------------- randread: (groupid=0, jobs=4): err= 0: pid=2995 read : io=9,484KiB, bw=161KiB/s, iops=39, runt= 60107msec clat (msec): min=1, max=2,167, avg=95.47, stdev=131.60 randwrite: (groupid=1, jobs=2): err= 0: pid=2999 write: io=2,692KiB, bw=45KiB/s, iops=11, runt= 60131msec clat (usec): min=199, max=30,043K, avg=178710.05, stdev=1281485.75 cgroup 2 (weight 100) -------------------- seqread0: (groupid=0, jobs=1): err= 0: pid=2993 read : io=547mib, bw=9,556kib/s, iops=2,333, runt= 60043msec clat (usec): min=92, max=224k, avg=426.74, stdev=5734.12 Note the BW of sequential reader in three cases (636 KB/s, 5,598KiB/s, 9,556KiB/s). dm-ioband tries to provide fairness in terms of number of sectors and it completely kills the disk throughput. with io scheduler based io controller, we see increased throughput for seqential reader as compared to CFQ, because now random readers are running in a separate group and hence reader gets isolation from random readers. Here are my fio jobs -------------------- First fio job file ----------------- [global] runtime=60 [randread] rw=randread size=2G iodepth=20 directory=/mnt/sdd1/fio/ direct=1 numjobs=4 group_reporting [randwrite] rw=randwrite size=1G iodepth=20 directory=/mnt/sdd1/fio/ group_reporting direct=1 numjobs=2 Second fio job file ------------------- [global] runtime=60 rw=read size=4G directory=/mnt/sdd2/fio/ direct=1 [seqread0] numjobs=1 group_reporting Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/