2009-06-16 16:08:26

by Ralf Gross

[permalink] [raw]
Subject: io-scheduler tuning for better read/write ratio

Hi,

I'm trying to tune the kernel/io-scheduler for better read/write
ratio on a Areca RAID0 device (4 disks, kernel 2.6.26, xfs fs).

I can get 200 MB/s seq. writes and about the same for seq. reads. My
problem is that if there are reads _and_ writes on this device, the
write throughput is much higher than the read throughput (40 MB/s
read, 90 MB/s write).

The deadline scheduler sounded like the way to go for getting better
read results, but reagardless which parameter I change, the ratio keeps
the same.

cfq, noop.. different paramter settings, but alway the same result.

short: is there a way to tune the kernel/scheduler settings in a way
to get a higher read throughput? Writes are not that important,
basicially there are only two 30 GB files on the device/filesystem
that are used to spool data for two LTO-4 tape drives. So I need a
certain read speed to keep both drives streaming. While the data gets
written to one file, I need at least 50 MB/s for reading from the
other file and sent to the tape drive.

Thanks, Ralf


2009-06-16 16:41:27

by David Newall

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Ralf Gross wrote:
> write throughput is much higher than the read throughput (40 MB/s
> read, 90 MB/s write).

Perhaps I've misunderstood, but isn't that common? Reads have to come
from disk, whereas writes get cached by the drive.

2009-06-16 18:40:57

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

David Newall schrieb:
> Ralf Gross wrote:
> > write throughput is much higher than the read throughput (40 MB/s
> > read, 90 MB/s write).

Hm, but I get higher read throughput (160-200 MB/s) if I don't write
to the device at the same time.

Ralf

2009-06-16 18:46:26

by Casey Dahlin

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

On 06/16/2009 02:40 PM, Ralf Gross wrote:
> David Newall schrieb:
>> Ralf Gross wrote:
>>> write throughput is much higher than the read throughput (40 MB/s
>>> read, 90 MB/s write).
>
> Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> to the device at the same time.
>
> Ralf

How specifically are you testing? It could depend a lot on the particular access patterns you're using to test.

--CJD

2009-06-16 18:56:27

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Casey Dahlin schrieb:
> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> > David Newall schrieb:
> >> Ralf Gross wrote:
> >>> write throughput is much higher than the read throughput (40 MB/s
> >>> read, 90 MB/s write).
> >
> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> > to the device at the same time.
> >
> > Ralf
>
> How specifically are you testing? It could depend a lot on the
> particular access patterns you're using to test.

I did the basic tests with tiobench. The real test is a test backup
(bacula) with 2 jobs that create 2 30 GB spool files on that device.
The jobs partially write to the device in parallel. Depending which
spool file reaches the 30 GB first, one starts reading from that file
and writing to tape, while to other is still spooling.

Ralf

2009-06-16 20:20:05

by Jeff Moyer

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Ralf Gross <[email protected]> writes:

> Casey Dahlin schrieb:
>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
>> > David Newall schrieb:
>> >> Ralf Gross wrote:
>> >>> write throughput is much higher than the read throughput (40 MB/s
>> >>> read, 90 MB/s write).
>> >
>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
>> > to the device at the same time.
>> >
>> > Ralf
>>
>> How specifically are you testing? It could depend a lot on the
>> particular access patterns you're using to test.
>
> I did the basic tests with tiobench. The real test is a test backup
> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> The jobs partially write to the device in parallel. Depending which
> spool file reaches the 30 GB first, one starts reading from that file
> and writing to tape, while to other is still spooling.

We are missing a lot of details, here. I guess the first thing I'd try
would be bumping up the max_readahead_kb parameter, since I'm guessing
that your backup application isn't driving very deep queue depths. If
that doesn't work, then please provide exact invocations of tiobench
that reprduce the problem or some blktrace output for your real test.

Cheers,
Jeff

2009-06-22 14:43:36

by Jeff Moyer

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Jeff Moyer <[email protected]> writes:

> Ralf Gross <[email protected]> writes:
>
>> Casey Dahlin schrieb:
>>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
>>> > David Newall schrieb:
>>> >> Ralf Gross wrote:
>>> >>> write throughput is much higher than the read throughput (40 MB/s
>>> >>> read, 90 MB/s write).
>>> >
>>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
>>> > to the device at the same time.
>>> >
>>> > Ralf
>>>
>>> How specifically are you testing? It could depend a lot on the
>>> particular access patterns you're using to test.
>>
>> I did the basic tests with tiobench. The real test is a test backup
>> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
>> The jobs partially write to the device in parallel. Depending which
>> spool file reaches the 30 GB first, one starts reading from that file
>> and writing to tape, while to other is still spooling.
>
> We are missing a lot of details, here. I guess the first thing I'd try
> would be bumping up the max_readahead_kb parameter, since I'm guessing
> that your backup application isn't driving very deep queue depths. If
> that doesn't work, then please provide exact invocations of tiobench
> that reprduce the problem or some blktrace output for your real test.

Any news, Ralf?

Cheers,
Jeff

2009-06-22 16:31:40

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Jeff Moyer schrieb:
> Jeff Moyer <[email protected]> writes:
>
> > Ralf Gross <[email protected]> writes:
> >
> >> Casey Dahlin schrieb:
> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> >>> > David Newall schrieb:
> >>> >> Ralf Gross wrote:
> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> >>> >>> read, 90 MB/s write).
> >>> >
> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> >>> > to the device at the same time.
> >>> >
> >>> > Ralf
> >>>
> >>> How specifically are you testing? It could depend a lot on the
> >>> particular access patterns you're using to test.
> >>
> >> I did the basic tests with tiobench. The real test is a test backup
> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> >> The jobs partially write to the device in parallel. Depending which
> >> spool file reaches the 30 GB first, one starts reading from that file
> >> and writing to tape, while to other is still spooling.
> >
> > We are missing a lot of details, here. I guess the first thing I'd try
> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> > that your backup application isn't driving very deep queue depths. If
> > that doesn't work, then please provide exact invocations of tiobench
> > that reprduce the problem or some blktrace output for your real test.
>
> Any news, Ralf?

sorry for the delay. atm there are large backups running and using the
raid device for spooling. So I can't do any tests.

Re. read ahead: I tested different settings from 8Kb to 65Kb, this
didn't help.

I'll do some more tests when the backups are done (3-4 more days).

Thanks, Ralf

2009-06-22 19:43:09

by Jeff Moyer

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Ralf Gross <[email protected]> writes:

> Jeff Moyer schrieb:
>> Jeff Moyer <[email protected]> writes:
>>
>> > Ralf Gross <[email protected]> writes:
>> >
>> >> Casey Dahlin schrieb:
>> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
>> >>> > David Newall schrieb:
>> >>> >> Ralf Gross wrote:
>> >>> >>> write throughput is much higher than the read throughput (40 MB/s
>> >>> >>> read, 90 MB/s write).
>> >>> >
>> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
>> >>> > to the device at the same time.
>> >>> >
>> >>> > Ralf
>> >>>
>> >>> How specifically are you testing? It could depend a lot on the
>> >>> particular access patterns you're using to test.
>> >>
>> >> I did the basic tests with tiobench. The real test is a test backup
>> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
>> >> The jobs partially write to the device in parallel. Depending which
>> >> spool file reaches the 30 GB first, one starts reading from that file
>> >> and writing to tape, while to other is still spooling.
>> >
>> > We are missing a lot of details, here. I guess the first thing I'd try
>> > would be bumping up the max_readahead_kb parameter, since I'm guessing
>> > that your backup application isn't driving very deep queue depths. If
>> > that doesn't work, then please provide exact invocations of tiobench
>> > that reprduce the problem or some blktrace output for your real test.
>>
>> Any news, Ralf?
>
> sorry for the delay. atm there are large backups running and using the
> raid device for spooling. So I can't do any tests.
>
> Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> didn't help.
>
> I'll do some more tests when the backups are done (3-4 more days).

The default is 128KB, I believe, so it's strange that you would test
smaller values. ;) I would try something along the lines of 1 or 2 MB.

I'm CCing Fengguang in case he has any suggestions.

Cheers,
Jeff

p.s. Fengguang, the thread starts here:
http://lkml.org/lkml/2009/6/16/390

2009-06-23 07:24:48

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Jeff Moyer schrieb:
> Ralf Gross <[email protected]> writes:
>
> > Jeff Moyer schrieb:
> >> Jeff Moyer <[email protected]> writes:
> >>
> >> > Ralf Gross <[email protected]> writes:
> >> >
> >> >> Casey Dahlin schrieb:
> >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> >> >>> > David Newall schrieb:
> >> >>> >> Ralf Gross wrote:
> >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> >> >>> >>> read, 90 MB/s write).
> >> >>> >
> >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> >> >>> > to the device at the same time.
> >> >>> >
> >> >>> > Ralf
> >> >>>
> >> >>> How specifically are you testing? It could depend a lot on the
> >> >>> particular access patterns you're using to test.
> >> >>
> >> >> I did the basic tests with tiobench. The real test is a test backup
> >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> >> >> The jobs partially write to the device in parallel. Depending which
> >> >> spool file reaches the 30 GB first, one starts reading from that file
> >> >> and writing to tape, while to other is still spooling.
> >> >
> >> > We are missing a lot of details, here. I guess the first thing I'd try
> >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> >> > that your backup application isn't driving very deep queue depths. If
> >> > that doesn't work, then please provide exact invocations of tiobench
> >> > that reprduce the problem or some blktrace output for your real test.
> >>
> >> Any news, Ralf?
> >
> > sorry for the delay. atm there are large backups running and using the
> > raid device for spooling. So I can't do any tests.
> >
> > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > didn't help.
> >
> > I'll do some more tests when the backups are done (3-4 more days).
>
> The default is 128KB, I believe, so it's strange that you would test
> smaller values. ;) I would try something along the lines of 1 or 2 MB.

Err, yes this should have been MB not KB.


$cat /sys/block/sdc/queue/read_ahead_kb
16384
$cat /sys/block/sdd/queue/read_ahead_kb
16384

I also tried different values for max_sectors_kb, nr_requests. But the
trend that writes were much faster than reads while there was read and
write load on the device didn't change.

Changing the deadline parameter writes_starved, write_expire,
read_expire, front_merges or fifo_batch didn't change this behavoir.

Ralf

2009-06-23 13:53:49

by Jeff Moyer

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Ralf Gross <[email protected]> writes:

> Jeff Moyer schrieb:
>> Ralf Gross <[email protected]> writes:
>>
>> > Jeff Moyer schrieb:
>> >> Jeff Moyer <[email protected]> writes:
>> >>
>> >> > Ralf Gross <[email protected]> writes:
>> >> >
>> >> >> Casey Dahlin schrieb:
>> >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
>> >> >>> > David Newall schrieb:
>> >> >>> >> Ralf Gross wrote:
>> >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
>> >> >>> >>> read, 90 MB/s write).
>> >> >>> >
>> >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
>> >> >>> > to the device at the same time.
>> >> >>> >
>> >> >>> > Ralf
>> >> >>>
>> >> >>> How specifically are you testing? It could depend a lot on the
>> >> >>> particular access patterns you're using to test.
>> >> >>
>> >> >> I did the basic tests with tiobench. The real test is a test backup
>> >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
>> >> >> The jobs partially write to the device in parallel. Depending which
>> >> >> spool file reaches the 30 GB first, one starts reading from that file
>> >> >> and writing to tape, while to other is still spooling.
>> >> >
>> >> > We are missing a lot of details, here. I guess the first thing I'd try
>> >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
>> >> > that your backup application isn't driving very deep queue depths. If
>> >> > that doesn't work, then please provide exact invocations of tiobench
>> >> > that reprduce the problem or some blktrace output for your real test.
>> >>
>> >> Any news, Ralf?
>> >
>> > sorry for the delay. atm there are large backups running and using the
>> > raid device for spooling. So I can't do any tests.
>> >
>> > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
>> > didn't help.
>> >
>> > I'll do some more tests when the backups are done (3-4 more days).
>>
>> The default is 128KB, I believe, so it's strange that you would test
>> smaller values. ;) I would try something along the lines of 1 or 2 MB.
>
> Err, yes this should have been MB not KB.
>
>
> $cat /sys/block/sdc/queue/read_ahead_kb
> 16384
> $cat /sys/block/sdd/queue/read_ahead_kb
> 16384
>
> I also tried different values for max_sectors_kb, nr_requests. But the
> trend that writes were much faster than reads while there was read and
> write load on the device didn't change.
>
> Changing the deadline parameter writes_starved, write_expire,
> read_expire, front_merges or fifo_batch didn't change this behavoir.

OK, bumping up readahead and changing the deadline parameters listed
should have give some better results, I would think. Can you give the
invocation of tiobench you used so I can try to reproduce this?

Thanks!
Jeff

2009-06-24 07:26:24

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Jeff Moyer schrieb:
> Ralf Gross <[email protected]> writes:
>
> > Jeff Moyer schrieb:
> >> Ralf Gross <[email protected]> writes:
> >>
> >> > Jeff Moyer schrieb:
> >> >> Jeff Moyer <[email protected]> writes:
> >> >>
> >> >> > Ralf Gross <[email protected]> writes:
> >> >> >
> >> >> >> Casey Dahlin schrieb:
> >> >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> >> >> >>> > David Newall schrieb:
> >> >> >>> >> Ralf Gross wrote:
> >> >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> >> >> >>> >>> read, 90 MB/s write).
> >> >> >>> >
> >> >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> >> >> >>> > to the device at the same time.
> >> >> >>> >
> >> >> >>> > Ralf
> >> >> >>>
> >> >> >>> How specifically are you testing? It could depend a lot on the
> >> >> >>> particular access patterns you're using to test.
> >> >> >>
> >> >> >> I did the basic tests with tiobench. The real test is a test backup
> >> >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> >> >> >> The jobs partially write to the device in parallel. Depending which
> >> >> >> spool file reaches the 30 GB first, one starts reading from that file
> >> >> >> and writing to tape, while to other is still spooling.
> >> >> >
> >> >> > We are missing a lot of details, here. I guess the first thing I'd try
> >> >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> >> >> > that your backup application isn't driving very deep queue depths. If
> >> >> > that doesn't work, then please provide exact invocations of tiobench
> >> >> > that reprduce the problem or some blktrace output for your real test.
> >> >>
> >> >> Any news, Ralf?
> >> >
> >> > sorry for the delay. atm there are large backups running and using the
> >> > raid device for spooling. So I can't do any tests.
> >> >
> >> > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> >> > didn't help.
> >> >
> >> > I'll do some more tests when the backups are done (3-4 more days).
> >>
> >> The default is 128KB, I believe, so it's strange that you would test
> >> smaller values. ;) I would try something along the lines of 1 or 2 MB.
> >
> > Err, yes this should have been MB not KB.
> >
> >
> > $cat /sys/block/sdc/queue/read_ahead_kb
> > 16384
> > $cat /sys/block/sdd/queue/read_ahead_kb
> > 16384
> >
> > I also tried different values for max_sectors_kb, nr_requests. But the
> > trend that writes were much faster than reads while there was read and
> > write load on the device didn't change.
> >
> > Changing the deadline parameter writes_starved, write_expire,
> > read_expire, front_merges or fifo_batch didn't change this behavoir.
>
> OK, bumping up readahead and changing the deadline parameters listed
> should have give some better results, I would think. Can you give the
> invocation of tiobench you used so I can try to reproduce this?

The main problem is with bacula. It reads/writes from/to two
spoolfiles on the same device.

I get the same behavior with 2 dd processes, one reading from disk, one writing
to it.

Here's the output from dstat (5 sec intervall).

--dsk/md1--
_read _writ
26M 95M
31M 96M
20M 85M
31M 108M
28M 89M
24M 95M
26M 79M
32M 115M
50M 74M
129M 15k
147M 1638B
147M 0
147M 0
113M 0


At the end I stopped the dd process that is writing to the device, so you can
see that the md device is capable of reading with >120 MB/s.

I did this with these two commands.

dd if=/dev/zero of=test bs=1MB
dd if=/dev/md1 of=/dev/null bs=1M


Maybe this is too simple, but with a real world application I see the same
behavior. md1 is a md raid 0 device with 2 disks.


md1 : active raid0 sdc[0] sdd[1]
781422592 blocks 64k chunks

sdc:

/sys/block/sdc/queue/hw_sector_size
512
/sys/block/sdc/queue/max_hw_sectors_kb
32767
/sys/block/sdc/queue/max_sectors_kb
512
/sys/block/sdc/queue/nomerges
0
/sys/block/sdc/queue/nr_requests
128
/sys/block/sdc/queue/read_ahead_kb
16384
/sys/block/sdc/queue/scheduler
noop anticipatory [deadline] cfq

/sys/block/sdc/queue/iosched/fifo_batch
16
/sys/block/sdc/queue/iosched/front_merges
1
/sys/block/sdc/queue/iosched/read_expire
500
/sys/block/sdc/queue/iosched/write_expire
5000
/sys/block/sdc/queue/iosched/writes_starved
2


sdd:

/sys/block/sdd/queue/hw_sector_size
512
/sys/block/sdd/queue/max_hw_sectors_kb
32767
/sys/block/sdd/queue/max_sectors_kb
512
/sys/block/sdd/queue/nomerges
0
/sys/block/sdd/queue/nr_requests
128
/sys/block/sdd/queue/read_ahead_kb
16384
/sys/block/sdd/queue/scheduler
noop anticipatory [deadline] cfq


/sys/block/sdd/queue/iosched/fifo_batch
16
/sys/block/sdd/queue/iosched/front_merges
1
/sys/block/sdd/queue/iosched/read_expire
500
/sys/block/sdd/queue/iosched/write_expire
5000
/sys/block/sdd/queue/iosched/writes_starved
2


The deadline parameters are the default ones. Setting writes_starved much
higher I expected a change in the read/write ratio, but didn't see any change.



Ralf

2009-06-24 07:56:48

by Al Boldi

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Ralf Gross wrote:
> The main problem is with bacula. It reads/writes from/to two
> spoolfiles on the same device.
>
> I get the same behavior with 2 dd processes, one reading from disk, one
> writing to it.
>
> Here's the output from dstat (5 sec intervall).
>
> --dsk/md1--
> _read _writ
> 26M 95M
> 31M 96M
> 20M 85M
> 31M 108M
> 28M 89M
> 24M 95M
> 26M 79M
> 32M 115M
> 50M 74M
> 129M 15k
> 147M 1638B
> 147M 0
> 147M 0
> 113M 0
>
>
> At the end I stopped the dd process that is writing to the device, so you
> can see that the md device is capable of reading with >120 MB/s.
>
> I did this with these two commands.
>
> dd if=/dev/zero of=test bs=1MB
> dd if=/dev/md1 of=/dev/null bs=1M

Try changing /proc/sys/vm/dirty_ratio = 1


Thanks!

--
Al

2009-06-25 07:27:21

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Al Boldi schrieb:
> Ralf Gross wrote:
> > The main problem is with bacula. It reads/writes from/to two
> > spoolfiles on the same device.
> >
> > I get the same behavior with 2 dd processes, one reading from disk, one
> > writing to it.
> >
> > Here's the output from dstat (5 sec intervall).
> >
> > --dsk/md1--
> > _read _writ
> > 26M 95M
> > 31M 96M
> > 20M 85M
> > 31M 108M
> > 28M 89M
> > 24M 95M
> > 26M 79M
> > 32M 115M
> > 50M 74M
> > 129M 15k
> > 147M 1638B
> > 147M 0
> > 147M 0
> > 113M 0
> >
> >
> > At the end I stopped the dd process that is writing to the device, so you
> > can see that the md device is capable of reading with >120 MB/s.
> >
> > I did this with these two commands.
> >
> > dd if=/dev/zero of=test bs=1MB
> > dd if=/dev/md1 of=/dev/null bs=1M
>
> Try changing /proc/sys/vm/dirty_ratio = 1

$cat /proc/sys/vm/dirty_ratio
1


$dstat -D md1 -d 5
--dsk/md1--
_read _writ
18M 18M
0 0
820k 101M
18M 113M
26M 73M
26M 110M
32M 100M
19M 111M
13M 117M
13M 142M
32M 88M
26M 99M
38M 58M

No change. Even setting dirty_ratio to 100 didn't show any difference.


With the cfq scheduler and slice_idle = 24 (trial and error) I get better
results. Itried this before, but the overall throughput was a bit lower than
with deadline.

It seems that I can not tune deadline to get the samet behaviour.

--dsk/md1--
_read _writ
18M 18M
25M 77M
51M 65M
51M 47M
62M 45M
53M 28M
45M 43M
46M 47M
47M 42M
51M 41M
38M 51M
51M 40M
45M 40M
58M 42M
69M 41M
72M 42M
122M 0
141M 340k
--dsk/md1--
_read _writ
139M 562k
136M 0
141M 13k
64M 0
1638B 104M
0 110M
0 122M
0 104M
0 108M

The last numbers are for reading/writing only.

Ralf

2009-06-25 07:28:23

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Al Boldi schrieb:
> Ralf Gross wrote:
> > The main problem is with bacula. It reads/writes from/to two
> > spoolfiles on the same device.
> >
> > I get the same behavior with 2 dd processes, one reading from disk, one
> > writing to it.
> >
> > Here's the output from dstat (5 sec intervall).
> >
> > --dsk/md1--
> > _read _writ
> > 26M 95M
> > 31M 96M
> > 20M 85M
> > 31M 108M
> > 28M 89M
> > 24M 95M
> > 26M 79M
> > 32M 115M
> > 50M 74M
> > 129M 15k
> > 147M 1638B
> > 147M 0
> > 147M 0
> > 113M 0
> >
> >
> > At the end I stopped the dd process that is writing to the device, so you
> > can see that the md device is capable of reading with >120 MB/s.
> >
> > I did this with these two commands.
> >
> > dd if=/dev/zero of=test bs=1MB
> > dd if=/dev/md1 of=/dev/null bs=1M
>
> Try changing /proc/sys/vm/dirty_ratio = 1

$cat /proc/sys/vm/dirty_ratio
1


$dstat -D md1 -d 5
--dsk/md1--
_read _writ
18M 18M
0 0
820k 101M
18M 113M
26M 73M
26M 110M
32M 100M
19M 111M
13M 117M
13M 142M
32M 88M
26M 99M
38M 58M

No change. Even setting dirty_ratio to 100 didn't show any difference.


With the cfq scheduler and slice_idle = 24 (trial and error) I get better
results. Itried this before, but the overall throughput was a bit lower than
with deadline.

It seems that I can not tune deadline to get the samet behaviour.

--dsk/md1--
_read _writ
18M 18M
25M 77M
51M 65M
51M 47M
62M 45M
53M 28M
45M 43M
46M 47M
47M 42M
51M 41M
38M 51M
51M 40M
45M 40M
58M 42M
69M 41M
72M 42M
122M 0
141M 340k
--dsk/md1--
_read _writ
139M 562k
136M 0
141M 13k
64M 0
1638B 104M
0 110M
0 122M
0 104M
0 108M

The last numbers are for reading/writing only.

Ralf

2009-06-25 13:44:38

by Al Boldi

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Ralf Gross wrote:
> Al Boldi schrieb:
> > Try changing /proc/sys/vm/dirty_ratio = 1
>
> $cat /proc/sys/vm/dirty_ratio
> 1
>
>
> $dstat -D md1 -d 5
> --dsk/md1--
> _read _writ
> 18M 18M
> 0 0
> 820k 101M
> 18M 113M
> 26M 73M
> 26M 110M
> 32M 100M
> 19M 111M
> 13M 117M
> 13M 142M
> 32M 88M
> 26M 99M
> 38M 58M
>
> No change. Even setting dirty_ratio to 100 didn't show any difference.

What's your readahead? Do a blockdev --getra /dev/sdX and /dev/mdX.
Try increasing it, while keeping dirty_ratio low.


Thanks!

--
Al

2009-06-26 02:19:24

by Fengguang Wu

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
> Ralf Gross <[email protected]> writes:
>
> > Jeff Moyer schrieb:
> >> Jeff Moyer <[email protected]> writes:
> >>
> >> > Ralf Gross <[email protected]> writes:
> >> >
> >> >> Casey Dahlin schrieb:
> >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> >> >>> > David Newall schrieb:
> >> >>> >> Ralf Gross wrote:
> >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> >> >>> >>> read, 90 MB/s write).
> >> >>> >
> >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> >> >>> > to the device at the same time.
> >> >>> >
> >> >>> > Ralf
> >> >>>
> >> >>> How specifically are you testing? It could depend a lot on the
> >> >>> particular access patterns you're using to test.
> >> >>
> >> >> I did the basic tests with tiobench. The real test is a test backup
> >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> >> >> The jobs partially write to the device in parallel. Depending which
> >> >> spool file reaches the 30 GB first, one starts reading from that file
> >> >> and writing to tape, while to other is still spooling.
> >> >
> >> > We are missing a lot of details, here. I guess the first thing I'd try
> >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> >> > that your backup application isn't driving very deep queue depths. If
> >> > that doesn't work, then please provide exact invocations of tiobench
> >> > that reprduce the problem or some blktrace output for your real test.
> >>
> >> Any news, Ralf?
> >
> > sorry for the delay. atm there are large backups running and using the
> > raid device for spooling. So I can't do any tests.
> >
> > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > didn't help.
> >
> > I'll do some more tests when the backups are done (3-4 more days).
>
> The default is 128KB, I believe, so it's strange that you would test
> smaller values. ;) I would try something along the lines of 1 or 2 MB.
>
> I'm CCing Fengguang in case he has any suggestions.

Jeff, thank you for the forwarding (and sorry for the long delay)!

The read:write (or rather sync:async) ratio control is an IO scheduler
feature. CFQ has parameters slice_sync and slice_async for that.
What's more, CFQ will let async IO wait if there are any in flight
sync IO. This is good, but not quite enough. Normally sync IOs come
one by one, with some small idle time window in between. If we only
start dispatching async IOs after the last sync IO has completed for
eg. 1ms, then we may stop the async background write IOs when there
are active sync foreground read IO stream.

This simple patch aims to address the writes-push-aside-reads problem.
Ralf, you can try applying this patch and run your workload with this
(huge) CFQ parameter:

echo 1000 > /sys/block/sda/queue/iosched/slice_sync

The patch is based on 2.6.30, but can be trivially backported if you
want to use some old kernel.

It may impact overall (sync+async) IO throughput when there are one or
more ongoing sync IO streams, so requires considerable benchmarks and
adjustments.

Thanks,
Fengguang
---

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index a55a9bd..14011b7 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
return;

- WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
WARN_ON(cfq_cfqq_slice_new(cfqq));

/*
@@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
* or if we want to idle in case it has no pending requests.
*/
if (cfqd->active_queue == cfqq) {
- const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
-
if (cfq_cfqq_slice_new(cfqq)) {
cfq_set_prio_slice(cfqd, cfqq);
cfq_clear_cfqq_slice_new(cfqq);
@@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
*/
if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
cfq_slice_expired(cfqd, 1);
- else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
- sync && !rq_noidle(rq))
+ else if (sync && !rq_noidle(rq) &&
+ !cfq_close_cooperator(cfqd, cfqq, 1))
cfq_arm_slice_timer(cfqd);
}

2009-06-26 10:44:18

by Jens Axboe

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

On Fri, Jun 26 2009, Wu Fengguang wrote:
> On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
> > Ralf Gross <[email protected]> writes:
> >
> > > Jeff Moyer schrieb:
> > >> Jeff Moyer <[email protected]> writes:
> > >>
> > >> > Ralf Gross <[email protected]> writes:
> > >> >
> > >> >> Casey Dahlin schrieb:
> > >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> > >> >>> > David Newall schrieb:
> > >> >>> >> Ralf Gross wrote:
> > >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> > >> >>> >>> read, 90 MB/s write).
> > >> >>> >
> > >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> > >> >>> > to the device at the same time.
> > >> >>> >
> > >> >>> > Ralf
> > >> >>>
> > >> >>> How specifically are you testing? It could depend a lot on the
> > >> >>> particular access patterns you're using to test.
> > >> >>
> > >> >> I did the basic tests with tiobench. The real test is a test backup
> > >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> > >> >> The jobs partially write to the device in parallel. Depending which
> > >> >> spool file reaches the 30 GB first, one starts reading from that file
> > >> >> and writing to tape, while to other is still spooling.
> > >> >
> > >> > We are missing a lot of details, here. I guess the first thing I'd try
> > >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> > >> > that your backup application isn't driving very deep queue depths. If
> > >> > that doesn't work, then please provide exact invocations of tiobench
> > >> > that reprduce the problem or some blktrace output for your real test.
> > >>
> > >> Any news, Ralf?
> > >
> > > sorry for the delay. atm there are large backups running and using the
> > > raid device for spooling. So I can't do any tests.
> > >
> > > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > > didn't help.
> > >
> > > I'll do some more tests when the backups are done (3-4 more days).
> >
> > The default is 128KB, I believe, so it's strange that you would test
> > smaller values. ;) I would try something along the lines of 1 or 2 MB.
> >
> > I'm CCing Fengguang in case he has any suggestions.
>
> Jeff, thank you for the forwarding (and sorry for the long delay)!
>
> The read:write (or rather sync:async) ratio control is an IO scheduler
> feature. CFQ has parameters slice_sync and slice_async for that.
> What's more, CFQ will let async IO wait if there are any in flight
> sync IO. This is good, but not quite enough. Normally sync IOs come
> one by one, with some small idle time window in between. If we only
> start dispatching async IOs after the last sync IO has completed for
> eg. 1ms, then we may stop the async background write IOs when there
> are active sync foreground read IO stream.
>
> This simple patch aims to address the writes-push-aside-reads problem.
> Ralf, you can try applying this patch and run your workload with this
> (huge) CFQ parameter:
>
> echo 1000 > /sys/block/sda/queue/iosched/slice_sync
>
> The patch is based on 2.6.30, but can be trivially backported if you
> want to use some old kernel.
>
> It may impact overall (sync+async) IO throughput when there are one or
> more ongoing sync IO streams, so requires considerable benchmarks and
> adjustments.
>
> Thanks,
> Fengguang
> ---
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index a55a9bd..14011b7 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
> if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
> return;
>
> - WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
> WARN_ON(cfq_cfqq_slice_new(cfqq));
>
> /*
> @@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> * or if we want to idle in case it has no pending requests.
> */
> if (cfqd->active_queue == cfqq) {
> - const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
> -
> if (cfq_cfqq_slice_new(cfqq)) {
> cfq_set_prio_slice(cfqd, cfqq);
> cfq_clear_cfqq_slice_new(cfqq);
> @@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> */
> if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
> cfq_slice_expired(cfqd, 1);
> - else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
> - sync && !rq_noidle(rq))
> + else if (sync && !rq_noidle(rq) &&
> + !cfq_close_cooperator(cfqd, cfqq, 1))
> cfq_arm_slice_timer(cfqd);
> }

What's the purpose of this patch? If you have requests pending you don't
want to arm the idle timer and wait, you want to dispatch those.

--
Jens Axboe

2009-06-27 03:47:16

by Fengguang Wu

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

On Fri, Jun 26, 2009 at 06:44:06PM +0800, Jens Axboe wrote:
> On Fri, Jun 26 2009, Wu Fengguang wrote:
> > On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
> > > Ralf Gross <[email protected]> writes:
> > >
> > > > Jeff Moyer schrieb:
> > > >> Jeff Moyer <[email protected]> writes:
> > > >>
> > > >> > Ralf Gross <[email protected]> writes:
> > > >> >
> > > >> >> Casey Dahlin schrieb:
> > > >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> > > >> >>> > David Newall schrieb:
> > > >> >>> >> Ralf Gross wrote:
> > > >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> > > >> >>> >>> read, 90 MB/s write).
> > > >> >>> >
> > > >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> > > >> >>> > to the device at the same time.
> > > >> >>> >
> > > >> >>> > Ralf
> > > >> >>>
> > > >> >>> How specifically are you testing? It could depend a lot on the
> > > >> >>> particular access patterns you're using to test.
> > > >> >>
> > > >> >> I did the basic tests with tiobench. The real test is a test backup
> > > >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> > > >> >> The jobs partially write to the device in parallel. Depending which
> > > >> >> spool file reaches the 30 GB first, one starts reading from that file
> > > >> >> and writing to tape, while to other is still spooling.
> > > >> >
> > > >> > We are missing a lot of details, here. I guess the first thing I'd try
> > > >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> > > >> > that your backup application isn't driving very deep queue depths. If
> > > >> > that doesn't work, then please provide exact invocations of tiobench
> > > >> > that reprduce the problem or some blktrace output for your real test.
> > > >>
> > > >> Any news, Ralf?
> > > >
> > > > sorry for the delay. atm there are large backups running and using the
> > > > raid device for spooling. So I can't do any tests.
> > > >
> > > > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > > > didn't help.
> > > >
> > > > I'll do some more tests when the backups are done (3-4 more days).
> > >
> > > The default is 128KB, I believe, so it's strange that you would test
> > > smaller values. ;) I would try something along the lines of 1 or 2 MB.
> > >
> > > I'm CCing Fengguang in case he has any suggestions.
> >
> > Jeff, thank you for the forwarding (and sorry for the long delay)!
> >
> > The read:write (or rather sync:async) ratio control is an IO scheduler
> > feature. CFQ has parameters slice_sync and slice_async for that.
> > What's more, CFQ will let async IO wait if there are any in flight
> > sync IO. This is good, but not quite enough. Normally sync IOs come
> > one by one, with some small idle time window in between. If we only
> > start dispatching async IOs after the last sync IO has completed for
> > eg. 1ms, then we may stop the async background write IOs when there
> > are active sync foreground read IO stream.
> >
> > This simple patch aims to address the writes-push-aside-reads problem.
> > Ralf, you can try applying this patch and run your workload with this
> > (huge) CFQ parameter:
> >
> > echo 1000 > /sys/block/sda/queue/iosched/slice_sync
> >
> > The patch is based on 2.6.30, but can be trivially backported if you
> > want to use some old kernel.
> >
> > It may impact overall (sync+async) IO throughput when there are one or
> > more ongoing sync IO streams, so requires considerable benchmarks and
> > adjustments.
> >
> > Thanks,
> > Fengguang
> > ---
> >
> > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> > index a55a9bd..14011b7 100644
> > --- a/block/cfq-iosched.c
> > +++ b/block/cfq-iosched.c
> > @@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
> > if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
> > return;
> >
> > - WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
> > WARN_ON(cfq_cfqq_slice_new(cfqq));
> >
> > /*
> > @@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> > * or if we want to idle in case it has no pending requests.
> > */
> > if (cfqd->active_queue == cfqq) {
> > - const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
> > -
> > if (cfq_cfqq_slice_new(cfqq)) {
> > cfq_set_prio_slice(cfqd, cfqq);
> > cfq_clear_cfqq_slice_new(cfqq);
> > @@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> > */
> > if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
> > cfq_slice_expired(cfqd, 1);
> > - else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
> > - sync && !rq_noidle(rq))
> > + else if (sync && !rq_noidle(rq) &&
> > + !cfq_close_cooperator(cfqd, cfqq, 1))
> > cfq_arm_slice_timer(cfqd);
> > }
>
> What's the purpose of this patch? If you have requests pending you don't
> want to arm the idle timer and wait, you want to dispatch those.

You are right, please ignore this mindless hacking patch.

Ralf, you can do the read/write ratio in the CFQ scheduler by tuning
the slice_sync/slice_async parameters.

For example,

echo 10 > /sys//block/sda/queue/iosched/slice_async
echo 100 > /sys//block/sda/queue/iosched/slice_sync

gives

-dsk/total-
read writ
66M 25M
65M 20M
49M 32M
84M 19M
46M 28M
61M 23M
55M 25M
67M 23M
76M 18M
46M 31M
56M 29M
54M 23M
76M 20M

while

echo 10 > /sys//block/sda/queue/iosched/slice_async
echo 300 > /sys//block/sda/queue/iosched/slice_sync

gives

-dsk/total-
read writ
102M 11M
82M 10M
100M 12M
86M 10M
95M 11M
102M 3168k
96M 11M
88M 10M
96M 12M

However too large slice_sync may not be desirable.

Thanks,
Fengguang

2009-06-29 09:49:38

by Ralf Gross

[permalink] [raw]
Subject: Re: io-scheduler tuning for better read/write ratio

Wu Fengguang schrieb:
> On Fri, Jun 26, 2009 at 06:44:06PM +0800, Jens Axboe wrote:
> > On Fri, Jun 26 2009, Wu Fengguang wrote:
> > > On Tue, Jun 23, 2009 at 03:42:46AM +0800, Jeff Moyer wrote:
> > > > Ralf Gross <[email protected]> writes:
> > > >
> > > > > Jeff Moyer schrieb:
> > > > >> Jeff Moyer <[email protected]> writes:
> > > > >>
> > > > >> > Ralf Gross <[email protected]> writes:
> > > > >> >
> > > > >> >> Casey Dahlin schrieb:
> > > > >> >>> On 06/16/2009 02:40 PM, Ralf Gross wrote:
> > > > >> >>> > David Newall schrieb:
> > > > >> >>> >> Ralf Gross wrote:
> > > > >> >>> >>> write throughput is much higher than the read throughput (40 MB/s
> > > > >> >>> >>> read, 90 MB/s write).
> > > > >> >>> >
> > > > >> >>> > Hm, but I get higher read throughput (160-200 MB/s) if I don't write
> > > > >> >>> > to the device at the same time.
> > > > >> >>> >
> > > > >> >>> > Ralf
> > > > >> >>>
> > > > >> >>> How specifically are you testing? It could depend a lot on the
> > > > >> >>> particular access patterns you're using to test.
> > > > >> >>
> > > > >> >> I did the basic tests with tiobench. The real test is a test backup
> > > > >> >> (bacula) with 2 jobs that create 2 30 GB spool files on that device.
> > > > >> >> The jobs partially write to the device in parallel. Depending which
> > > > >> >> spool file reaches the 30 GB first, one starts reading from that file
> > > > >> >> and writing to tape, while to other is still spooling.
> > > > >> >
> > > > >> > We are missing a lot of details, here. I guess the first thing I'd try
> > > > >> > would be bumping up the max_readahead_kb parameter, since I'm guessing
> > > > >> > that your backup application isn't driving very deep queue depths. If
> > > > >> > that doesn't work, then please provide exact invocations of tiobench
> > > > >> > that reprduce the problem or some blktrace output for your real test.
> > > > >>
> > > > >> Any news, Ralf?
> > > > >
> > > > > sorry for the delay. atm there are large backups running and using the
> > > > > raid device for spooling. So I can't do any tests.
> > > > >
> > > > > Re. read ahead: I tested different settings from 8Kb to 65Kb, this
> > > > > didn't help.
> > > > >
> > > > > I'll do some more tests when the backups are done (3-4 more days).
> > > >
> > > > The default is 128KB, I believe, so it's strange that you would test
> > > > smaller values. ;) I would try something along the lines of 1 or 2 MB.
> > > >
> > > > I'm CCing Fengguang in case he has any suggestions.
> > >
> > > Jeff, thank you for the forwarding (and sorry for the long delay)!
> > >
> > > The read:write (or rather sync:async) ratio control is an IO scheduler
> > > feature. CFQ has parameters slice_sync and slice_async for that.
> > > What's more, CFQ will let async IO wait if there are any in flight
> > > sync IO. This is good, but not quite enough. Normally sync IOs come
> > > one by one, with some small idle time window in between. If we only
> > > start dispatching async IOs after the last sync IO has completed for
> > > eg. 1ms, then we may stop the async background write IOs when there
> > > are active sync foreground read IO stream.
> > >
> > > This simple patch aims to address the writes-push-aside-reads problem.
> > > Ralf, you can try applying this patch and run your workload with this
> > > (huge) CFQ parameter:
> > >
> > > echo 1000 > /sys/block/sda/queue/iosched/slice_sync
> > >
> > > The patch is based on 2.6.30, but can be trivially backported if you
> > > want to use some old kernel.
> > >
> > > It may impact overall (sync+async) IO throughput when there are one or
> > > more ongoing sync IO streams, so requires considerable benchmarks and
> > > adjustments.
> > >
> > > Thanks,
> > > Fengguang
> > > ---
> > >
> > > diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> > > index a55a9bd..14011b7 100644
> > > --- a/block/cfq-iosched.c
> > > +++ b/block/cfq-iosched.c
> > > @@ -1064,7 +1064,6 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd)
> > > if (blk_queue_nonrot(cfqd->queue) && cfqd->hw_tag)
> > > return;
> > >
> > > - WARN_ON(!RB_EMPTY_ROOT(&cfqq->sort_list));
> > > WARN_ON(cfq_cfqq_slice_new(cfqq));
> > >
> > > /*
> > > @@ -2175,8 +2174,6 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> > > * or if we want to idle in case it has no pending requests.
> > > */
> > > if (cfqd->active_queue == cfqq) {
> > > - const bool cfqq_empty = RB_EMPTY_ROOT(&cfqq->sort_list);
> > > -
> > > if (cfq_cfqq_slice_new(cfqq)) {
> > > cfq_set_prio_slice(cfqd, cfqq);
> > > cfq_clear_cfqq_slice_new(cfqq);
> > > @@ -2190,8 +2187,8 @@ static void cfq_completed_request(struct request_queue *q, struct request *rq)
> > > */
> > > if (cfq_slice_used(cfqq) || cfq_class_idle(cfqq))
> > > cfq_slice_expired(cfqd, 1);
> > > - else if (cfqq_empty && !cfq_close_cooperator(cfqd, cfqq, 1) &&
> > > - sync && !rq_noidle(rq))
> > > + else if (sync && !rq_noidle(rq) &&
> > > + !cfq_close_cooperator(cfqd, cfqq, 1))
> > > cfq_arm_slice_timer(cfqd);
> > > }
> >
> > What's the purpose of this patch? If you have requests pending you don't
> > want to arm the idle timer and wait, you want to dispatch those.
>
> You are right, please ignore this mindless hacking patch.
>
> Ralf, you can do the read/write ratio in the CFQ scheduler by tuning
> the slice_sync/slice_async parameters.
>
> For example,
>
> echo 10 > /sys//block/sda/queue/iosched/slice_async
> echo 100 > /sys//block/sda/queue/iosched/slice_sync
>
> gives
>
> -dsk/total-
> read writ
> 66M 25M
> 65M 20M
> 49M 32M
> 84M 19M
> 46M 28M
> 61M 23M
> 55M 25M
> 67M 23M
> 76M 18M
> 46M 31M
> 56M 29M
> 54M 23M
> 76M 20M


writing:

--dsk/md1--
_read _writ
0 150M
0 142M
0 143M
0 112M
0 141M
0 152M
0 132M
0 123M
0 149M


reading:

--dsk/md1--
_read _writ
143M 0
145M 0
160M 0
128M 0
148M 0
140M 0
158M 0
130M 0
122M 0

reading + writing:

--dsk/md1--
_read _writ
55M 76M
41M 83M
64M 81M
64M 83M
63M 68M
56M 117M
41M 61M
64M 87M
64M 69M
61M 87M
67M 81M
64M 33M
63M 68M
56M 76M



> while
>
> echo 10 > /sys//block/sda/queue/iosched/slice_async
> echo 300 > /sys//block/sda/queue/iosched/slice_sync
>
> gives
>
> -dsk/total-
> read writ
> 102M 11M
> 82M 10M
> 100M 12M
> 86M 10M
> 95M 11M
> 102M 3168k
> 96M 11M
> 88M 10M
> 96M 12M
>
> However too large slice_sync may not be desirable.

writing:

--dsk/md1--
_read _writ
0 131M
0 136M
0 145M
0 136M
0 128M
0 150M
0 127M
0 149M
0 127M
0 156M
0 125M
0 142M

reading:

--dsk/md1--
_read _writ
128M 0
160M 0
128M 0
128M 0
160M 0
128M 0
109M 0
128M 0
128M 0
160M 0
128M 0


writing:

--dsk/md1--
_read _writ
0 183M
0 142M
0 137M
0 147M
0 135M
0 147M
0 117M
0 135M
0 156M
0 120M
0 147M
0 135M

reading + writing:

--dsk/md1--
_read _writ
96M 40M
64M 38M
96M 29M
96M 24M
96M 31M
95M 35M
97M 26M
96M 23M
96M 33M
95M 73M
91M 25M


Thanks, this seem to be what I was looking for. I'll change the scheduler
parameter for all spool devices and will run a backup with two concurrent
backups. This will show me if bacula behaves the same as the simple dd test
does.


Ralf