Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756885AbZFZKoS (ORCPT ); Fri, 26 Jun 2009 06:44:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752454AbZFZKoG (ORCPT ); Fri, 26 Jun 2009 06:44:06 -0400 Received: from brick.kernel.dk ([93.163.65.50]:38132 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751983AbZFZKoE (ORCPT ); Fri, 26 Jun 2009 06:44:04 -0400 Date: Fri, 26 Jun 2009 12:44:06 +0200 From: Jens Axboe To: Wu Fengguang Cc: Jeff Moyer , Ralf Gross , "linux-kernel@vger.kernel.org" , linux-fsdevel@vger.kernel.org Subject: Re: io-scheduler tuning for better read/write ratio Message-ID: <20090626104406.GK23611@kernel.dk> References: <20090616154342.GA7043@p15145560.pureserver.info> <4A37CB2A.6010209@davidnewall.com> <20090616184027.GB7043@p15145560.pureserver.info> <4A37E7DB.7030100@redhat.com> <20090616185600.GC7043@p15145560.pureserver.info> <20090622163113.GD12483@p15145560.pureserver.info> <20090626021905.GA23981@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090626021905.GA23981@localhost> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 5017 Lines: 122 On Fri, Jun 26 2009, Wu Fengguang wrote: > On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote: > > Ralf Gross writes: > > > > > Jeff Moyer schrieb: > > >> Jeff Moyer writes: > > >> > > >> > Ralf Gross writes: > > >> > > > >> >> Casey Dahlin schrieb: > > >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote: > > >> >>> > David Newall schrieb: > > >> >>> >> Ralf Gross wrote: > > >> >>> >>> write throughput is much higher than the read throughput (40 MB/s > > >> >>> >>> read, 90 MB/s write). > > >> >>> > > > >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write > > >> >>> > to the device at the same time. > > >> >>> > > > >> >>> > Ralf > > >> >>> > > >> >>> How specifically are you testing? It could depend a lot on the > > >> >>> particular access patterns you're using to test. > > >> >> > > >> >> I did the basic tests with tiobench. The real test is a test backup > > >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device. > > >> >> The jobs partially write to the device in parallel. Depending which > > >> >> spool file reaches the 30 GB first, one starts reading from that file > > >> >> and writing to tape, while to other is still spooling. > > >> > > > >> > We are missing a lot of details, here. I guess the first thing I'd try > > >> > would be bumping up the max_readahead_kb parameter, since I'm guessing > > >> > that your backup application isn't driving very deep queue depths. If > > >> > that doesn't work, then please provide exact invocations of tiobench > > >> > that reprduce the problem or some blktrace output for your real test. > > >> > > >> Any news, Ralf? > > > > > > sorry for the delay. atm there are large backups running and using the > > > raid device for spooling. So I can't do any tests. > > > > > > Re. read ahead: I tested different settings from 8Kb to 65Kb, this > > > didn't help. > > > > > > I'll do some more tests when the backups are done (3-4 more days). > > > > The default is 128KB, I believe, so it's strange that you would test > > smaller values. ;) I would try something along the lines of 1 or 2 MB. > > > > I'm CCing Fengguang in case he has any suggestions. > > Jeff, thank you for the forwarding (and sorry for the long delay)! > > The read:write (or rather sync:async) ratio control is an IO scheduler > feature. CFQ has parameters slice_sync and slice_async for that. > What's more, CFQ will let async IO wait if there are any in flight > sync IO. This is good, but not quite enough. Normally sync IOs come > one by one, with some small idle time window in between. If we only > start dispatching async IOs after the last sync IO has completed for > eg. 1ms, then we may stop the async background write IOs when there > are active sync foreground read IO stream. > > This simple patch aims to address the writes-push-aside-reads problem. > Ralf, you can try applying this patch and run your workload with this > (huge) CFQ parameter: > > echo 1000 > /sys/block/sda/queue/iosched/slice_sync > > The patch is based on 2.6.30, but can be trivially backported if you > want to use some old kernel. > > It may impact overall (sync+async) IO throughput when there are one or > more ongoing sync IO streams, so requires considerable benchmarks and > adjustments. > > Thanks, > Fengguang > --- > > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c > index a55a9bd..14011b7 100644 > --- a/block/cfq-iosched.c > +++ b/block/cfq-iosched.c > @@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) > if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag) > return; > > - WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list)); > WARN_ON(cfq_cfqq_slice_new(cfqq)); > > /* > @@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) > * or if we want to idle in case it has no pending requests. > */ > if (cfqd->active_queue == cfqq) { > - const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list); > - > if (cfq_cfqq_slice_new(cfqq)) { > cfq_set_prio_slice(cfqd, cfqq); > cfq_clear_cfqq_slice_new(cfqq); > @@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq) > */ > if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq)) > cfq_slice_expired(cfqd, 1); > - else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) && > - sync && !rq_noidle(rq)) > + else if (sync && !rq_noidle(rq) && > + !cfq_close_cooperator(cfqd, cfqq, 1)) > cfq_arm_slice_timer(cfqd); > } What's the purpose of this patch? If you have requests pending you don't want to arm the idle timer and wait, you want to dispatch those. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/