2002-09-25 17:15:41

by Jens Axboe

[permalink] [raw]
Subject: [PATCH] deadline io scheduler

Hi,

Due to recent "problems" (well the vm being just too damn good at keep
disks busy these days), it's become even more apparent that our current
io scheduler just cannot cope with some work loads. Repeated starvartion
of reads is the most important one. The Andrew Morton Interactive
Workload (AMIW) [1] rates the current kernel poorly, on my test machine
it completes in 1-2 minutes depending on your luck. 2.5.38-BK does a lot
better, but mainly because it's being extremely unfair. This deadline io
scheduler finishes the AMIW in anywhere from ~0.5 seconds to ~3-4
seconds, depending on the io load.

I'd like folks to give it a test spin. Make two kernels, a 2.5.38
pristine and a 2.5.38 with this patch applied. Now beat on each of them,
while listening to mp3's. Or read mails and change folders. Or anything
else that gives you a feel for the interactiveness of the machine. Then
report your findings. I'm interested in _anything_.

There are a few tunables, but I'd suggest trying the defaults first.
Then expirement with these two:

static int read_expire = HZ / 2;

This defines the read expire time, current default is 500ms.

static int writes_starved = 2;

This defines how many times reads can starve writes. 2 means that we can
do two rounds of reads for 1 write.

If you are curious how deadline-iosched works, search lkml archives for
previous announcements. I might make a new one if there's any
interesting in a big detailed analysis, since there has been some
changes since last release.

[1] Flush lots of stuff to disk (I start a dbench xxx, or do a dd
if=/dev/zero of=test_file bs=64k), and then time a cat dir/*.c where
dir/ holds lots of source files.

--
Jens Axboe


Attachments:
(No filename) (1.66 kB)
deadline-iosched-11 (15.54 kB)
Download all attachments

2002-09-26 06:10:53

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler


This is looking good. With a little more tuning and tweaking
this problem is solved.

The horror test was:

cd /usr/src/linux
dd if=/dev/zero of=foo bs=1M count=4000
sleep 5
time cat kernel/*.c > /dev/null

Testing on IDE (this matters - SCSI is very different)

- On 2.5.38 + souped-up VM it was taking 25 seconds.

- My read-latency patch took 1 second-odd.

- Linus' rework yesterday was taking 0.3 seconds.

- With Linus' current tree (with the deadline scheduler) it now takes
5 seconds.

Let's see what happens as we vary read_expire:

read_expire (ms) time cat kernel/*.c (secs)
500 5.2
400 3.8
300 4.5
200 3.9
100 5.1
50 5.0

well that was a bit of a placebo ;)

Let's leave read_expire at 500ms and diddle writes_starved:

writes_starved (units) time cat kernel/*.c (secs)
1 4.8
2 4.4
4 4.0
8 4.9
16 4.9


Now alter fifo_batch, everything else default:

fifo_batch (units) time cat kernel/*.c (secs)
64 5.0
32 2.0
16 0.2
8 0.17

OK, that's a winner.


Here's something really nice with the deadline scheduler. I was
madly catting five separate kernel trees (five reading processes)
and then started a big `dd', tunables at default:

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 9 0 6008 2460 8304 324716 0 0 2048 0 1102 254 13 88 0
0 7 0 6008 2600 8288 324480 0 0 1800 0 1114 266 0 100 0
0 6 0 6008 2452 8292 324520 0 0 2432 0 1126 287 29 71 0
0 6 0 6008 3160 8292 323952 0 0 3568 0 1132 312 0 100 0
0 6 0 6008 2860 8296 324148 128 0 2984 0 1119 281 17 83 0
1 6 0 5984 2856 8264 323816 352 0 5240 0 1162 479 0 100 0
0 7 1 5984 4152 7876 324068 0 0 1648 28192 1215 1572 1 99 0
0 9 2 6016 3136 7300 328568 0 180 1232 37248 1324 1201 3 97 0
0 9 2 6020 5260 5628 329212 0 4 1112 29488 1296 560 0 100 0
0 9 3 6020 3548 5596 330944 0 0 1064 35240 1302 629 6 94 0
0 9 3 6020 3412 5572 331352 0 0 744 31744 1298 452 6 94 0
0 9 2 6020 1516 5576 333352 0 0 888 31488 1283 467 0 100 0
0 9 2 6020 3528 5580 331396 0 0 1312 20768 1251 385 0 100 0

Note how the read rate maybe halved, and we sustained a high
volume of writeback. This is excellent.


Let's try it again with fifo_batch at 16:

0 5 0 80 303936 3960 49288 0 0 2520 0 1092 174 0 100 0
0 5 0 80 302400 3996 50776 0 0 3040 0 1094 172 20 80 0
0 5 0 80 301164 4032 51988 0 0 2504 0 1082 150 0 100 0
0 5 0 80 299708 4060 53412 0 0 2904 0 1084 149 0 100 0
1 5 1 80 164640 4060 186784 0 0 1344 26720 1104 891 1 99 0
0 6 2 80 138900 4060 212088 0 0 280 7928 1039 226 0 100 0
0 6 2 80 134992 4064 215928 0 0 1512 7704 1100 226 0 100 0
0 6 2 80 130880 4068 219976 0 0 1928 9688 1124 245 17 83 0
0 6 2 80 123316 4084 227432 0 0 2664 8200 1125 283 11 89 0

That looks acceptable. Writes took quite a bit of punishment, but
the VM should cope with that OK.

It'd be interesting to know why read_expire and writes_starved have
no effect, while fifo_batch has a huge effect.

I'd like to gain a solid understanding of what these three knobs do.
Could you explain that a little more?

During development I'd suggest the below patch, to add
/proc/sys/vm/read_expire, fifo_batch and writes_starved - it beats
recompiling each time.

I'll test scsi now.



drivers/block/deadline-iosched.c | 18 +++++++++---------
kernel/sysctl.c | 12 ++++++++++++
2 files changed, 21 insertions(+), 9 deletions(-)

--- 2.5.38/drivers/block/deadline-iosched.c~akpm-deadline Wed Sep 25 22:16:36 2002
+++ 2.5.38-akpm/drivers/block/deadline-iosched.c Wed Sep 25 23:05:45 2002
@@ -24,14 +24,14 @@
* fifo_batch is how many steps along the sorted list we will take when the
* front fifo request expires.
*/
-static int read_expire = HZ / 2; /* 500ms start timeout */
-static int fifo_batch = 64; /* 4 seeks, or 64 contig */
-static int seek_cost = 16; /* seek is 16 times more expensive */
+int read_expire = HZ / 2; /* 500ms start timeout */
+int fifo_batch = 64; /* 4 seeks, or 64 contig */
+int seek_cost = 16; /* seek is 16 times more expensive */

/*
* how many times reads are allowed to starve writes
*/
-static int writes_starved = 2;
+int writes_starved = 2;

static const int deadline_hash_shift = 8;
#define DL_HASH_BLOCK(sec) ((sec) >> 3)
@@ -253,7 +253,7 @@ static void deadline_move_requests(struc
{
struct list_head *sort_head = &dd->sort_list[rq_data_dir(rq)];
sector_t last_sec = dd->last_sector;
- int batch_count = dd->fifo_batch;
+ int batch_count = fifo_batch;

do {
struct list_head *nxt = rq->queuelist.next;
@@ -267,7 +267,7 @@ static void deadline_move_requests(struc
if (rq->sector == last_sec)
batch_count--;
else
- batch_count -= dd->seek_cost;
+ batch_count -= seek_cost;

if (nxt == sort_head)
break;
@@ -319,7 +319,7 @@ dispatch:
* if we have expired entries on the fifo list, move some to dispatch
*/
if (deadline_check_fifo(dd)) {
- if (writes && (dd->starved++ >= dd->writes_starved))
+ if (writes && (dd->starved++ >= writes_starved))
goto dispatch_writes;

nxt = dd->read_fifo.next;
@@ -329,7 +329,7 @@ dispatch:
}

if (!list_empty(&dd->sort_list[READ])) {
- if (writes && (dd->starved++ >= dd->writes_starved))
+ if (writes && (dd->starved++ >= writes_starved))
goto dispatch_writes;

nxt = dd->sort_list[READ].next;
@@ -392,7 +392,7 @@ deadline_add_request(request_queue_t *q,
/*
* set expire time and add to fifo list
*/
- drq->expires = jiffies + dd->read_expire;
+ drq->expires = jiffies + read_expire;
list_add_tail(&drq->fifo, &dd->read_fifo);
}
}
--- 2.5.38/kernel/sysctl.c~akpm-deadline Wed Sep 25 22:59:48 2002
+++ 2.5.38-akpm/kernel/sysctl.c Wed Sep 25 23:05:42 2002
@@ -272,6 +272,9 @@ static int zero = 0;
static int one = 1;
static int one_hundred = 100;

+extern int fifo_batch;
+extern int read_expire;
+extern int writes_starved;

static ctl_table vm_table[] = {
{VM_OVERCOMMIT_MEMORY, "overcommit_memory", &sysctl_overcommit_memory,
@@ -314,6 +317,15 @@ static ctl_table vm_table[] = {
{VM_HUGETLB_PAGES, "nr_hugepages", &htlbpage_max, sizeof(int), 0644, NULL,
&proc_dointvec},
#endif
+ {90, "read_expire",
+ &read_expire, sizeof(read_expire), 0644,
+ NULL, &proc_dointvec},
+ {91, "fifo_batch",
+ &fifo_batch, sizeof(fifo_batch), 0644,
+ NULL, &proc_dointvec},
+ {92, "writes_starved",
+ &writes_starved, sizeof(writes_starved), 0644,
+ NULL, &proc_dointvec},
{0}
};


.

2002-09-26 06:28:39

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

From: Andrew Morton <[email protected]>
Date: Wed, 25 Sep 2002 23:15:58 -0700

I'd like to gain a solid understanding of what these three knobs do.
Could you explain that a little more?

My basic understanding of fifo_batch is:

1) fifo_batch is how many contiguous requests can be in
a "set"

2) we send out one write "set" for every two read "sets"

3) a seek works out to "seek_cost" contiguous requests,
cost wise, this gets subtracted from how many requests
the current "set" has left that are allowed to be used

2002-09-26 06:39:51

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Wed, Sep 25 2002, Andrew Morton wrote:
>
> This is looking good. With a little more tuning and tweaking
> this problem is solved.
>
> The horror test was:
>
> cd /usr/src/linux
> dd if=/dev/zero of=foo bs=1M count=4000
> sleep 5
> time cat kernel/*.c > /dev/null
>
> Testing on IDE (this matters - SCSI is very different)

Yes, SCSI specific stuff comes next.

> - On 2.5.38 + souped-up VM it was taking 25 seconds.
>
> - My read-latency patch took 1 second-odd.
>
> - Linus' rework yesterday was taking 0.3 seconds.
>
> - With Linus' current tree (with the deadline scheduler) it now takes
> 5 seconds.
>
> Let's see what happens as we vary read_expire:
>
> read_expire (ms) time cat kernel/*.c (secs)
> 500 5.2
> 400 3.8
> 300 4.5
> 200 3.9
> 100 5.1
> 50 5.0
>
> well that was a bit of a placebo ;)

For this work load, more on that later.

> Let's leave read_expire at 500ms and diddle writes_starved:
>
> writes_starved (units) time cat kernel/*.c (secs)
> 1 4.8
> 2 4.4
> 4 4.0
> 8 4.9
> 16 4.9

Interesting

> Now alter fifo_batch, everything else default:
>
> fifo_batch (units) time cat kernel/*.c (secs)
> 64 5.0
> 32 2.0
> 16 0.2
> 8 0.17
>
> OK, that's a winner.

Cool, I'm resting benchmarks with 16 as the default now. I fear this
might be too agressive, and that 32 will be a decent value.

> Here's something really nice with the deadline scheduler. I was
> madly catting five separate kernel trees (five reading processes)
> and then started a big `dd', tunables at default:
>
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 9 0 6008 2460 8304 324716 0 0 2048 0 1102 254 13 88 0
> 0 7 0 6008 2600 8288 324480 0 0 1800 0 1114 266 0 100 0
> 0 6 0 6008 2452 8292 324520 0 0 2432 0 1126 287 29 71 0
> 0 6 0 6008 3160 8292 323952 0 0 3568 0 1132 312 0 100 0
> 0 6 0 6008 2860 8296 324148 128 0 2984 0 1119 281 17 83 0
> 1 6 0 5984 2856 8264 323816 352 0 5240 0 1162 479 0 100 0
> 0 7 1 5984 4152 7876 324068 0 0 1648 28192 1215 1572 1 99 0
> 0 9 2 6016 3136 7300 328568 0 180 1232 37248 1324 1201 3 97 0
> 0 9 2 6020 5260 5628 329212 0 4 1112 29488 1296 560 0 100 0
> 0 9 3 6020 3548 5596 330944 0 0 1064 35240 1302 629 6 94 0
> 0 9 3 6020 3412 5572 331352 0 0 744 31744 1298 452 6 94 0
> 0 9 2 6020 1516 5576 333352 0 0 888 31488 1283 467 0 100 0
> 0 9 2 6020 3528 5580 331396 0 0 1312 20768 1251 385 0 100 0
>
> Note how the read rate maybe halved, and we sustained a high
> volume of writeback. This is excellent.

Yep

> Let's try it again with fifo_batch at 16:
>
> 0 5 0 80 303936 3960 49288 0 0 2520 0 1092 174 0 100 0
> 0 5 0 80 302400 3996 50776 0 0 3040 0 1094 172 20 80 0
> 0 5 0 80 301164 4032 51988 0 0 2504 0 1082 150 0 100 0
> 0 5 0 80 299708 4060 53412 0 0 2904 0 1084 149 0 100 0
> 1 5 1 80 164640 4060 186784 0 0 1344 26720 1104 891 1 99 0
> 0 6 2 80 138900 4060 212088 0 0 280 7928 1039 226 0 100 0
> 0 6 2 80 134992 4064 215928 0 0 1512 7704 1100 226 0 100 0
> 0 6 2 80 130880 4068 219976 0 0 1928 9688 1124 245 17 83 0
> 0 6 2 80 123316 4084 227432 0 0 2664 8200 1125 283 11 89 0
>
> That looks acceptable. Writes took quite a bit of punishment, but
> the VM should cope with that OK.
>
> It'd be interesting to know why read_expire and writes_starved have
> no effect, while fifo_batch has a huge effect.
>
> I'd like to gain a solid understanding of what these three knobs do.
> Could you explain that a little more?

Sure. The reason you are not seeing a big change with read expire, is
that you basically only have one thread issuing reads. Once you start
flooding the queue with more threads doing reads, then read expire just
puts a lid on the max latency that will incur. So you are probably not
hitting the read expire logic at all, or just slightly.

The three tunables are:

read_expire. This one controls how old a request can be, before we
attempt to move it to the dispatch queue. This is the starvation logic
for the read list. When a read expires, the other nobs control what the
behaviour is.

fifo_batch. This one controls how big a batch of requests we move from
the sort lists to the dispatch queue. The idea was that we don't want to
move single requests, since that might cause seek storms. Instead we
move a batch of request, starting at the expire head for reads if
necessary, along the sorted list to the dispatch queue. fifo_batch is
the total cost that can be endured, a total of seeks and non-seeky
requests. With you fifo_batch at 16, we can only move on seeky request
to the dispatch queue. Or we can move 16 non-seeky requests. Or a few
non-seeky request, and a seeky one. You get the idea.

writes_starved. This controls how many times reads get preferred over
writes. The default is 2, which means that we can serve two batches of
reads over one write batch. A value of 4 would mean that reads could
skip ahead of writes 4 times. A value of 1 would give you 1:1
read:write, ie no read preference. A silly value of 0 would give you
write preference, always.

Hope this helps?

> During development I'd suggest the below patch, to add
> /proc/sys/vm/read_expire, fifo_batch and writes_starved - it beats
> recompiling each time.

It sure does, I either want to talk Al into making the ioschedfs (better
name will be selected :-) or try and do it myself so we can do this
properly.

> I'll test scsi now.

Cool. I found a buglet that causes incorrect accounting when moving
request if the dispatch queue is not empty. Attached.

===== drivers/block/deadline-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/deadline-iosched.c Wed Sep 25 21:16:26 2002
+++ edited/drivers/block/deadline-iosched.c Thu Sep 26 08:33:35 2002
@@ -254,6 +254,15 @@
struct list_head *sort_head = &dd->sort_list[rq_data_dir(rq)];
sector_t last_sec = dd->last_sector;
int batch_count = dd->fifo_batch;
+
+ /*
+ * if dispatch is non-empty, disregard last_sector and check last one
+ */
+ if (!list_empty(dd->dispatch)) {
+ struct request *__rq = list_entry_rq(dd->dispatch->prev);
+
+ last_sec = __rq->sector + __rq->nr_sectors;
+ }

do {
struct list_head *nxt = rq->queuelist.next;

--
Jens Axboe

2002-09-26 06:54:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26 2002, Jens Axboe wrote:
> > Now alter fifo_batch, everything else default:
> >
> > fifo_batch (units) time cat kernel/*.c (secs)
> > 64 5.0
> > 32 2.0
> > 16 0.2
> > 8 0.17
> >
> > OK, that's a winner.
>
> Cool, I'm resting benchmarks with 16 as the default now. I fear this
> might be too agressive, and that 32 will be a decent value.

fifo_batch=16 drops throughput slightly on tiobench, however it also
gives really really good interactive behaviour here. Using 32 doesn't
change that a whole lot, the throughput that is. This might just be
normal deviation between runs, more are needed to be sure. Note that
I'm testing with the last_sec patch I posted, you should too.

BTW, for SCSI, it would be nice to first convert more drivers to use the
block level queued tagging. That would provide us with a much better
means to control starvation properly on SCSI as well.

--
Jens Axboe

2002-09-26 07:01:07

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26, 2002 at 08:59:51AM +0200, Jens Axboe wrote:
> BTW, for SCSI, it would be nice to first convert more drivers to use the
> block level queued tagging. That would provide us with a much better
> means to control starvation properly on SCSI as well.

Hmm, qlogicisp.c isn't really usable because the disks are too slow, it
needs bounce buffering, and nobody will touch the driver (and I don't
seem to be able to figure out what's going on with it myself), and the
FC stuff seems to need out-of-tree drivers to work. I wonder if I some
help converting them to this might be found.


Thanks,
Bill

2002-09-26 07:18:07

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

From: William Lee Irwin III <[email protected]>
Date: Thu, 26 Sep 2002 00:06:15 -0700
> Hmm, qlogicisp.c isn't really usable because the disks are too
> slow, it needs bounce buffering, and nobody will touch the driver

On Thu, Sep 26, 2002 at 12:06:20AM -0700, David S. Miller wrote:
> I think it's high time to blow away qlogic{fc,isp}.c and put
> Matt Jacob's qlogic stuff into 2.5.x

Is this different from the v61b5 stuff? I can test it on my qla2310
and ISP1020 if need be.

The main issue with qlogicisp.c is that it's just not modern enough to
keep up with the rest of the system so testing with it is basically a
stress test for how things hold up with lots of highmem, lots of bounce
buffering and with a severely limited ability to perform disk I/O.

qlogicisp.c is also not very reflective of the hardware used in NUMA-Q
systems in the field, it just happened to be available from the scrap heap.


Thanks,
Bill

2002-09-26 07:07:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Andrew Morton wrote:
>
> I'll test scsi now.
>

aic7xxx, Fujitsu "MAF3364L SUN36G" (36G SCA-2)


Maximum number of TCQ tags=253

fifo_batch time cat kernel/*.c (seconds)
64 58
32 54
16 20
8 58
4 1:15
2 53

Maximum number of TCQ tags=4

fifo_batch time cat kernel/*.c (seconds)
64 53
32 39
16 33
8 21
4 22
2 36
1 22


Maximum number of TCQ tags = 0:

fifo_batch time cat kernel/*.c (seconds)
64 22
32 10.3
16 10.5
8 5.5
4 3.2
2 1.9

I selected fifo_batch=16 and altered writes_starved and read_expires
again. They made no appreciable difference.

>From this I can only conclude that my poor little read was stuck
in the disk for ages while TCQ busily allowed new incoming writes
to bypass already-sent reads.

A dreadful misdesign. Unless we can control this with barriers,
and if Fujutsu is typical, TCQ is just uncontrollable. I, for
one, would not turn it on in a pink fit.

2002-09-26 07:06:55

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

William Lee Irwin III wrote:
> Hmm, qlogicisp.c isn't really usable because the disks are too slow, it
> needs bounce buffering, and nobody will touch the driver (and I don't
> seem to be able to figure out what's going on with it myself), and the
> FC stuff seems to need out-of-tree drivers to work. I wonder if I some
> help converting them to this might be found.


I use this driver on my ancient ev56 Alpha, if you need me to do some
testing.

Unfortunately it is fragile and known to have obscure bugs... Compaq
was beating up on this driver for quite a while, but I never saw
anything but bandaids [and they fully admitted their fixes were bandaids].

There is an out-of-tree qlogic driver that is reported to be far better
-- but not necessarily close to Linux kernel coding style.

/me wonders if people are encouraged or scared off, at this point...

2002-09-26 07:12:27

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26 2002, Andrew Morton wrote:
> Andrew Morton wrote:
> >
> > I'll test scsi now.
> >
>
> aic7xxx, Fujitsu "MAF3364L SUN36G" (36G SCA-2)
>
>
> Maximum number of TCQ tags=253
>
> fifo_batch time cat kernel/*.c (seconds)
> 64 58
> 32 54
> 16 20
> 8 58
> 4 1:15
> 2 53
>
> Maximum number of TCQ tags=4
>
> fifo_batch time cat kernel/*.c (seconds)
> 64 53
> 32 39
> 16 33
> 8 21
> 4 22
> 2 36
> 1 22
>
>
> Maximum number of TCQ tags = 0:
>
> fifo_batch time cat kernel/*.c (seconds)
> 64 22
> 32 10.3
> 16 10.5
> 8 5.5
> 4 3.2
> 2 1.9
>
> I selected fifo_batch=16 and altered writes_starved and read_expires
> again. They made no appreciable difference.

Abysmal. BTW, fifo_batch value less than seek cost doesn't make too much
sense, unless the drive has really slow streaming io performance.

> >From this I can only conclude that my poor little read was stuck
> in the disk for ages while TCQ busily allowed new incoming writes
> to bypass already-sent reads.
>
> A dreadful misdesign. Unless we can control this with barriers,
> and if Fujutsu is typical, TCQ is just uncontrollable. I, for
> one, would not turn it on in a pink fit.

I have this dream that we might be able to control this if we get our
hands on the queueing at the block level. The above looks really really
bad though, in the past I've had quite good experience with a tag depth
of 4. I should try ide tcq again, to see how that goes.

--
Jens Axboe

2002-09-26 07:07:19

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

From: William Lee Irwin III <[email protected]>
Date: Thu, 26 Sep 2002 00:06:15 -0700

Hmm, qlogicisp.c isn't really usable because the disks are too
slow, it needs bounce buffering, and nobody will touch the driver

I think it's high time to blow away qlogic{fc,isp}.c and put
Matt Jacob's qlogic stuff into 2.5.x

2002-09-26 07:14:46

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

From: Jeff Garzik <[email protected]>
Date: Thu, 26 Sep 2002 03:16:32 -0400

David S. Miller wrote:
> I think it's high time to blow away qlogic{fc,isp}.c and put
> Matt Jacob's qlogic stuff into 2.5.x

Seconded. Thanks for remembering that name.

No problem :)

Has his stuff been cleaned up, code-wise, in the past few years? My
experience with it was 100% positive from a technical standpoint, but
negative from a style standpoint...

I think it'll be less work to toss his stuff into the tree
and have some janitor whack on it than try to get someone
to maintain what we have now.

2002-09-26 07:11:48

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

David S. Miller wrote:
> From: William Lee Irwin III <[email protected]>
> Date: Thu, 26 Sep 2002 00:06:15 -0700
>
> Hmm, qlogicisp.c isn't really usable because the disks are too
> slow, it needs bounce buffering, and nobody will touch the driver
>
> I think it's high time to blow away qlogic{fc,isp}.c and put
> Matt Jacob's qlogic stuff into 2.5.x


Seconded. Thanks for remembering that name.

Has his stuff been cleaned up, code-wise, in the past few years? My
experience with it was 100% positive from a technical standpoint, but
negative from a style standpoint...

Jeff, volunteering to test the QL-ISP



2002-09-26 07:10:03

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26, 2002 at 03:11:31AM -0400, Jeff Garzik wrote:
> I use this driver on my ancient ev56 Alpha, if you need me to do some
> testing.
> Unfortunately it is fragile and known to have obscure bugs... Compaq
> was beating up on this driver for quite a while, but I never saw
> anything but bandaids [and they fully admitted their fixes were bandaids].
> There is an out-of-tree qlogic driver that is reported to be far better
> -- but not necessarily close to Linux kernel coding style.
> /me wonders if people are encouraged or scared off, at this point...

I've got no idea what's going on with it. It just happens to explode when
parallel mkfs's are done. It looks like there's a bug where it can walk
off the end of an array when it gets an unexpected message but fixing
that doesn't help.


Thanks,
Bill

2002-09-26 07:28:35

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

David S. Miller wrote:
> I think it'll be less work to toss his stuff into the tree
> and have some janitor whack on it than try to get someone
> to maintain what we have now.


Does that mean you're volunteering to throw it into the tree? ;-)

Just dug up the URL, in case anybody is interested:
http://www.feral.com/isp.html

2002-09-26 07:29:38

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Hi,

I found a small problem where hash would not contain the right request
state. Basically we updated the hash too soon, this bug was introduced
when the merge_cleanup stuff was removed.

It's not a bit deal, it just means that the hash didn't catch as many
merges as it should. However for efficiency it needs to be correct, of
course :-)

Current deadline against 2.5.38-BK attached.

===== drivers/block/deadline-iosched.c 1.1 vs edited =====
--- 1.1/drivers/block/deadline-iosched.c Wed Sep 25 21:16:26 2002
+++ edited/drivers/block/deadline-iosched.c Thu Sep 26 09:24:39 2002
@@ -25,7 +25,7 @@
* front fifo request expires.
*/
static int read_expire = HZ / 2; /* 500ms start timeout */
-static int fifo_batch = 64; /* 4 seeks, or 64 contig */
+static int fifo_batch = 32; /* 4 seeks, or 64 contig */
static int seek_cost = 16; /* seek is 16 times more expensive */

/*
@@ -164,7 +164,7 @@
*req = __rq;
q->last_merge = &__rq->queuelist;
ret = ELEVATOR_BACK_MERGE;
- goto out_ret;
+ goto out;
}
}

@@ -198,16 +198,18 @@
}

out:
- if (ret != ELEVATOR_NO_MERGE) {
- struct deadline_rq *drq = RQ_DATA(*req);
-
- deadline_del_rq_hash(drq);
- deadline_add_rq_hash(dd, drq);
- }
-out_ret:
return ret;
}

+static void deadline_merged_request(request_queue_t *q, struct request *req)
+{
+ struct deadline_data *dd = q->elevator.elevator_data;
+ struct deadline_rq *drq = RQ_DATA(req);
+
+ deadline_del_rq_hash(drq);
+ deadline_add_rq_hash(dd, drq);
+}
+
static void
deadline_merge_request(request_queue_t *q, struct request *req, struct request *next)
{
@@ -255,6 +257,15 @@
sector_t last_sec = dd->last_sector;
int batch_count = dd->fifo_batch;

+ /*
+ * if dispatch is non-empty, disregard last_sector and check last one
+ */
+ if (!list_empty(dd->dispatch)) {
+ struct request *__rq = list_entry_rq(dd->dispatch->prev);
+
+ last_sec = __rq->sector + __rq->nr_sectors;
+ }
+
do {
struct list_head *nxt = rq->queuelist.next;

@@ -544,6 +555,7 @@

elevator_t iosched_deadline = {
.elevator_merge_fn = deadline_merge,
+ .elevator_merged_fn = deadline_merged_request,
.elevator_merge_req_fn = deadline_merge_request,
.elevator_next_req_fn = deadline_next_request,
.elevator_add_req_fn = deadline_add_request,
===== drivers/block/elevator.c 1.27 vs edited =====
--- 1.27/drivers/block/elevator.c Thu Sep 26 08:23:11 2002
+++ edited/drivers/block/elevator.c Thu Sep 26 09:20:03 2002
@@ -250,6 +250,14 @@
return ELEVATOR_NO_MERGE;
}

+void elv_merged_request(request_queue_t *q, struct request *rq)
+{
+ elevator_t *e = &q->elevator;
+
+ if (e->elevator_merged_fn)
+ e->elevator_merged_fn(q, rq);
+}
+
void elv_merge_requests(request_queue_t *q, struct request *rq,
struct request *next)
{
===== drivers/block/ll_rw_blk.c 1.111 vs edited =====
--- 1.111/drivers/block/ll_rw_blk.c Thu Sep 26 08:23:11 2002
+++ edited/drivers/block/ll_rw_blk.c Thu Sep 26 09:23:05 2002
@@ -1606,6 +1606,7 @@
req->biotail = bio;
req->nr_sectors = req->hard_nr_sectors += nr_sectors;
drive_stat_acct(req, nr_sectors, 0);
+ elv_merged_request(q, req);
attempt_back_merge(q, req);
goto out;

@@ -1629,6 +1630,7 @@
req->sector = req->hard_sector = sector;
req->nr_sectors = req->hard_nr_sectors += nr_sectors;
drive_stat_acct(req, nr_sectors, 0);
+ elv_merged_request(q, req);
attempt_front_merge(q, req);
goto out;

===== include/linux/elevator.h 1.14 vs edited =====
--- 1.14/include/linux/elevator.h Thu Sep 26 08:23:11 2002
+++ edited/include/linux/elevator.h Thu Sep 26 09:25:14 2002
@@ -6,6 +6,8 @@

typedef void (elevator_merge_req_fn) (request_queue_t *, struct request *, struct request *);

+typedef void (elevator_merged_fn) (request_queue_t *, struct request *);
+
typedef struct request *(elevator_next_req_fn) (request_queue_t *);

typedef void (elevator_add_req_fn) (request_queue_t *, struct request *, struct list_head *);
@@ -19,6 +21,7 @@
struct elevator_s
{
elevator_merge_fn *elevator_merge_fn;
+ elevator_merged_fn *elevator_merged_fn;
elevator_merge_req_fn *elevator_merge_req_fn;

elevator_next_req_fn *elevator_next_req_fn;
@@ -42,6 +45,7 @@
extern int elv_merge(request_queue_t *, struct request **, struct bio *);
extern void elv_merge_requests(request_queue_t *, struct request *,
struct request *);
+extern void elv_merged_request(request_queue_t *, struct request *);
extern void elv_remove_request(request_queue_t *, struct request *);
extern int elv_queue_empty(request_queue_t *);
extern inline struct list_head *elv_get_sort_head(request_queue_t *, struct request *);

--
Jens Axboe

2002-09-26 07:38:26

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Jeff Garzik wrote:
> Just dug up the URL, in case anybody is interested:
> http://www.feral.com/isp.html

And I just noticed this:

The QLogic driver bundle is also now available under read-only BitKeeper
(see http://www.bitkeeper.com for information). The BK URL is:
bk://bitkeeper.feral.com:9002.

2002-09-26 07:36:14

by David Miller

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

From: Jeff Garzik <[email protected]>
Date: Thu, 26 Sep 2002 03:33:18 -0400

Just dug up the URL, in case anybody is interested:
http://www.feral.com/isp.html

Note there is a bitkeeper tree to pull from even :-)

2002-09-26 08:13:31

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26, 2002 at 04:15:02PM +0800, Michael Clark wrote:
> What are people out there using with their QLA 2200/2300s?
> ~mc

I'm using the qla 61b5 release of the qla2xxx on a qla2310.
I've not tried Matt Jacob's drivers.


Cheers,
Bil

2002-09-26 08:09:53

by Michael Clark

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler



On 09/26/02 15:35, David S. Miller wrote:
> From: Jeff Garzik <[email protected]>
> Date: Thu, 26 Sep 2002 03:33:18 -0400
>
> Just dug up the URL, in case anybody is interested:
> http://www.feral.com/isp.html

Would be nice to have a stable qlogic driver in the main kernel.

Although last time i tried Matt Jabob's driver, it locked up
after 30 seconds of running bonnie. At least with Qlogic's
driver I can run bonnie and cerberus continuously for 2 weeks
with no problems (although this may have been because
Matt's driver ignored the command queue throttle set in the
qlogic cards BIOS).

> Note there is a bitkeeper tree to pull from even :-)

The qlogic HBAs are a real problem in choosing which driver
to use out of:

in kernel qlogicfc
Qlogic's qla2x00 v4.x, v5.x, v6.x
Matthew Jacob's isp_mod

What are people out there using with their QLA 2200/2300s?

~mc

2002-09-26 08:22:50

by Daniel Pittman

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, 26 Sep 2002, Jens Axboe wrote:
> On Wed, Sep 25 2002, Andrew Morton wrote:

[...]

> writes_starved. This controls how many times reads get preferred over
> writes. The default is 2, which means that we can serve two batches of
> reads over one write batch. A value of 4 would mean that reads could
> skip ahead of writes 4 times. A value of 1 would give you 1:1
> read:write, ie no read preference. A silly value of 0 would give you
> write preference, always.

Actually, a value of zero doesn't sound completely silly to me, right
now, since I have been doing a lot of thinking about video capture
recently.

How much is it going to hurt a filesystem like ext[23] if that value is
set to zero while doing large streaming writes -- something like
(almost) uncompressed video at ten to twenty meg a second, for
gigabytes?

This is a situation where, for a dedicated machine, delaying reads
almost forever is actually a valuable thing. At least, valuable until it
stops the writes from being able to proceed.

Daniel

--
The best way to get a bad law repealed is to enforce it strictly.
-- Abraham Lincoln

2002-09-26 08:24:39

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26 2002, Daniel Pittman wrote:
> On Thu, 26 Sep 2002, Jens Axboe wrote:
> > On Wed, Sep 25 2002, Andrew Morton wrote:
>
> [...]
>
> > writes_starved. This controls how many times reads get preferred over
> > writes. The default is 2, which means that we can serve two batches of
> > reads over one write batch. A value of 4 would mean that reads could
> > skip ahead of writes 4 times. A value of 1 would give you 1:1
> > read:write, ie no read preference. A silly value of 0 would give you
> > write preference, always.
>
> Actually, a value of zero doesn't sound completely silly to me, right
> now, since I have been doing a lot of thinking about video capture
> recently.
>
> How much is it going to hurt a filesystem like ext[23] if that value is
> set to zero while doing large streaming writes -- something like
> (almost) uncompressed video at ten to twenty meg a second, for
> gigabytes?

You are going to stalll all reads indefinately :-)

> This is a situation where, for a dedicated machine, delaying reads
> almost forever is actually a valuable thing. At least, valuable until it
> stops the writes from being able to proceed.

Well 0 should achieve that quite fine

--
Jens Axboe

2002-09-26 15:04:40

by Rik van Riel

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, 26 Sep 2002, Daniel Pittman wrote:

> > read:write, ie no read preference. A silly value of 0 would give you
> > write preference, always.

> How much is it going to hurt a filesystem like ext[23] if that value is
> set to zero while doing large streaming writes -- something like
> (almost) uncompressed video at ten to twenty meg a second, for
> gigabytes?

It depends, if you've got 2 video streams to the same
filesystem and one needs to read a block bitmap in order
to allocate more disk blocks you lose...

regards,

Rik
--
A: No.
Q: Should I include quotations after my reply?

http://www.surriel.com/ http://distro.conectiva.com/

2002-09-26 15:49:43

by Patrick Mansfield

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26, 2002 at 08:59:51AM +0200, Jens Axboe wrote:
> On Thu, Sep 26 2002, Jens Axboe wrote:
> BTW, for SCSI, it would be nice to first convert more drivers to use the
> block level queued tagging. That would provide us with a much better
> means to control starvation properly on SCSI as well.
>
> --
> Jens Axboe

I haven't look closely at the block tagging, but for the FCP protocol,
there are no tags, just the type of queueing to use (task attributes)
- like ordered, head of queue, untagged, and some others. The tagging
is normally done on the adapter itself (FCP2 protocol AFAIK). Does this
mean block level queued tagging can't help FCP?

Maybe the same for iSCSI, other protocols, and pseudo adapters -
usb, ide, and raid adapters.

-- Patrick Mansfield

2002-09-26 17:36:52

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Michael Clark [[email protected]] wrote:
> The qlogic HBAs are a real problem in choosing which driver
> to use out of:
>
> in kernel qlogicfc
> Qlogic's qla2x00 v4.x, v5.x, v6.x
> Matthew Jacob's isp_mod
>

We have had good results using the Qlogic's driver. We are currently
running the v6.x version with Failover tunred off on 23xx cards. We have
run a lot on > 4GB systems also.

-andmike
--
Michael Anderson
[email protected]

2002-09-26 17:58:39

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Mike Anderson wrote:
> We have had good results using the Qlogic's driver. We are currently
> running the v6.x version with Failover tunred off on 23xx cards. We have
> run a lot on > 4GB systems also.


Has anybody put work into cleaning this driver up?

The word from kernel hackers that work on it is, they would rather write
a new driver than spend weeks cleaning it up :/

2002-09-26 19:15:13

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Jeff Garzik [[email protected]] wrote:
> Has anybody put work into cleaning this driver up?
>
> The word from kernel hackers that work on it is, they would rather write
> a new driver than spend weeks cleaning it up :/
>

Andrew Vasquez from Qlogic can provide more detailed comments on deltas
between the versions of the driver.

The v6.x driver is cleaner and supporting newer kernel interfaces than
past versions.


-andmike
--
Michael Anderson
[email protected]

2002-09-26 20:15:53

by Thomas Tonino

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Michael Clark wrote:

> Although last time i tried Matt Jabob's driver, it locked up
> after 30 seconds of running bonnie. At least with Qlogic's
> driver I can run bonnie and cerberus continuously for 2 weeks
> with no problems (although this may have been because
> Matt's driver ignored the command queue throttle set in the
> qlogic cards BIOS).

My excerience with a JBOD box is the in kernel driver locking up with the "no
handle slots, this should not happen" message in half an hour running a 4 MB/sec
write load.

Then tried the feral.com driver. That one was stable with the same load. Ran
that one for a month or two.

Then came along the highio patch in -AA. Made me want to switch to the in kernel
qlogic driver again. This was a good time to try a patch by Andrew Patterson,
AFAIR upping the number of slots to 255 and fixing the calculations around them.
This has been running without problems for a few months now.

The patch has been posted to the list. It can be found at
http://groups.google.com/groups?selm=linux.scsi.1019759258.2413.1.camel%40lvadp.fc.hp.com

> The qlogic HBAs are a real problem in choosing which driver
> to use out of:
>
> in kernel qlogicfc
> Qlogic's qla2x00 v4.x, v5.x, v6.x
> Matthew Jacob's isp_mod

I never tried Qlogic's driver, probably because of all the versions floating around.


Thomas


2002-09-26 22:12:52

by Matt Porter

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26, 2002 at 02:03:19PM -0400, Jeff Garzik wrote:
> Mike Anderson wrote:
> > We have had good results using the Qlogic's driver. We are currently
> > running the v6.x version with Failover tunred off on 23xx cards. We have
> > run a lot on > 4GB systems also.
>
>
> Has anybody put work into cleaning this driver up?
>
> The word from kernel hackers that work on it is, they would rather write
> a new driver than spend weeks cleaning it up :/

I added Mark Bellon to this since he has spent a lot of time working
with QLogic to get this cleaned up for the OSDL tree. He can probably
address some specific questions.

Regards,
--
Matt Porter
[email protected]
This is Linux Country. On a quiet night, you can hear Windows reboot.

2002-09-26 22:30:22

by Mark Bellon

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Matt Porter wrote:

>On Thu, Sep 26, 2002 at 02:03:19PM -0400, Jeff Garzik wrote:
>
>
>>Mike Anderson wrote:
>>
>>
>>>We have had good results using the Qlogic's driver. We are currently
>>>running the v6.x version with Failover tunred off on 23xx cards. We have
>>>run a lot on > 4GB systems also.
>>>
>>>
>>Has anybody put work into cleaning this driver up?
>>
>>The word from kernel hackers that work on it is, they would rather write
>>a new driver than spend weeks cleaning it up :/
>>
>>
>
>I added Mark Bellon to this since he has spent a lot of time working
>with QLogic to get this cleaned up for the OSDL tree. He can probably
>address some specific questions.
>
I fought with them for quite some time to get the major rewrite that
occured between
level 5 and level 6. The level 6 driver in TLT and OSDL is a version
that has many of
my suggestions and a few enhancements in it. It is "much better than a
stick in the eye".
We should now be in sync with their releases. I haven't looked recently
to see if there
is something newer than the one I checked in.

It still has a long way to go. I have threatened to rewrite it more than
once. However,
there is a plan to get Qlogic to do all of this and the presentation
keeps getting put off.
It needs to be rewritten for the "so called" hardend driver stuff and
that would be a
good juncture to make the rewrite happen.

Hacking the driver source is next to useless - Qlogic releases from
their own tree
constantly. They are OK about taking things back but are a bit slow.

I can help with any suggested changes and cleanups.

mark



2002-09-26 23:18:30

by Daniel Pittman

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, 26 Sep 2002, Jens Axboe wrote:
> On Thu, Sep 26 2002, Daniel Pittman wrote:
>> On Thu, 26 Sep 2002, Jens Axboe wrote:
>> > On Wed, Sep 25 2002, Andrew Morton wrote:
>>
>> [...]
>>
>> > writes_starved. This controls how many times reads get preferred
>> > over writes. The default is 2, which means that we can serve two
>> > batches of reads over one write batch. A value of 4 would mean that
>> > reads could skip ahead of writes 4 times. A value of 1 would give
>> > you 1:1 read:write, ie no read preference. A silly value of 0 would
>> > give you write preference, always.
>>
>> Actually, a value of zero doesn't sound completely silly to me, right
>> now, since I have been doing a lot of thinking about video capture
>> recently.
>>
>> How much is it going to hurt a filesystem like ext[23] if that value
>> is set to zero while doing large streaming writes -- something like
>> (almost) uncompressed video at ten to twenty meg a second, for
>> gigabytes?
>
> You are going to stalll all reads indefinately :-)

Which has some potentially fatal consequences, really, if any of the
capture code gets paged out before the streaming write starts, or if the
filesystem needs to read a bitmap block or so, as Rik points out.

>> This is a situation where, for a dedicated machine, delaying reads
>> almost forever is actually a valuable thing. At least, valuable until
>> it stops the writes from being able to proceed.
>
> Well 0 should achieve that quite fine

Would you consider allowing something akin to 'writes_starved = -4' to
allow writes to bypass reads only 4 times -- a preference for writes,
but not forever?

That's going to express the bias I (think I) want for this case, but
it's not going to be able to stall a read forever...

Daniel

--
It is quite humbling to realize that the storage occupied by the longest line
from a typical Usenet posting is sufficient to provide a state space so vast
that all the computation power in the world can not conquer it.
-- Dave Wallace

2002-09-27 05:46:45

by Andrew Vasquez

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, 26 Sep 2002, Mike Anderson wrote:

> Jeff Garzik [[email protected]] wrote:
> > Has anybody put work into cleaning this driver up?
> >
> > The word from kernel hackers that work on it is, they would rather write
> > a new driver than spend weeks cleaning it up :/
> >
>
> Andrew Vasquez from Qlogic can provide more detailed comments on deltas
> between the versions of the driver.
>
> The v6.x driver is cleaner and supporting newer kernel interfaces than
> past versions.
>
All,

I believe we had made some significant progress over the past few
months at delivering a reasonably stable and maintainable device
driver for ISP2100/ISP22xx/ISP23xx chips. Our original goals for the
6.x series driver included:

o Stability
- Failover
- Fabric topologies
- Kernel interface integrations

o Maintainability
- Code sanitization!
- Strip dead-code and support for kernels < 2.4.

o Feature integrations
- Fast-path streamlining
- RIO/ZIO
- IP via FC
- ISP2100 support
- ...

Note:
Most if not all of Arjan van de Ven's (Redhat) changes that are in
later RH kernel errata releases (addon/qla2200) have made it into the
6.x series code. Much thanks goes out to Arjan for his work, not just
at the technical level, but also, the impact his work had on reshaping
the landscape of attitudes and direction within the Linux Driver Group
at QLogic.

The formal release of 6.01.00 has been completed and should be available
for download 'real soon now' (as it appears 6.01b5 is still the latest
6.x series driver available) -- package has been forwarded to the
website maintainer. Notable changes from the 6.00 release include:

o ISP2100 support

o IP via FC support (RFC 2625)

o General code-sanitizing
- locking structures
- queue structures
- extraneous NOP*LOCK/UNLOCK macros
- remove old EH routines
- remove serial console routines

o Bug-fixes.

Our current mode of operation for 6.x series work is, choose a driver
subsystem (i.e. command posting, fabric support, failover, or
post-command processing), rehash assuptions and requirements, review
design and role in driver, retool or reimplement, and test. For
example, changes made to the source-tip since 6.01 include:

o A complete rewrite of qla2x00_64bit_start_scsi()
- fix 4gb page boundary limitation
- correct endian-ness during IOCB preperation
- simplification

o Additional 64bit DMA fixes

o Additional 2.5 support fixes

o More bug-fixes

There is still alot of challenging work to be done, and perhaps now
would be a good time to ask the community, what they need and would
like to see happen with the QLogic driver? I'll start of with a brief
list of 'important' TODOs we've compiled:

o Interrupt handler cleanup

o ZIO/RIO patches for 23xx

o Continue support for kernel 2.5 and above

o Adding support for PCI-HOT plug
- complete pci_driver interface

o Fabric management
- Use login-IOCBs instead of mailbox command
- GNFT support

o SNIA API support (version 2.0)

o Complete command posting module.

o Alternative persistent binding method (non modules.conf based)

o Failover processing simplification

o VI support

I hope this helps to clearup some of the haze and ambiguity
surrounding QLogic's work with the 6.x series driver, and perhaps
at the same time, prepares a medium for discussion regarding the 6.x
series driver.

--
Andrew Vasquez | [email protected] |
I prefer an accomidating vice to an obstinate virtue
DSS: 0x508316BB, FP: 79BD 4FAC 7E82 FF70 6C2B 7E8B 168F 5529 5083 16BB


Attachments:
(No filename) (3.47 kB)
(No filename) (254.00 B)
Download all attachments

2002-09-27 05:53:13

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Andrew Vasquez wrote:
> I hope this helps to clearup some of the haze and ambiguity
> surrounding QLogic's work with the 6.x series driver, and perhaps
> at the same time, prepares a medium for discussion regarding the 6.x
> series driver.


Wow, thanks for all that information, and it's great that you've
integrated Arjan's work and feedback.

There is one big question left unanswered... Where can the source for
the latest version with all this wonderful stuff be found? :) I don't
see a URL even for 6.01b5.

Jeff



2002-09-27 15:57:03

by Andrew Vasquez

[permalink] [raw]
Subject: RE: [PATCH] deadline io scheduler

> Andrew Vasquez wrote:
> > I hope this helps to clearup some of the haze and ambiguity
> > surrounding QLogic's work with the 6.x series driver, and perhaps
> > at the same time, prepares a medium for discussion regarding the 6.x
> > series driver.
>
> Wow, thanks for all that information, and it's great that you've
> integrated Arjan's work and feedback.
>
> There is one big question left unanswered... Where can the
> source for
> the latest version with all this wonderful stuff be found?
> :) I don't
> see a URL even for 6.01b5.
>
Sure, the 6.01b5 tarball can be found at:

http://download.qlogic.com/drivers/5642/qla2x00-v6.1b5-dist.tgz

In general all QLogic drivers are available from the following URL:

http://www.qlogic.com/support/drivers_software.asp

In my mind, a larger question is determining a balance between the
'Release Early, release often' mantra of Linux development and the
'kinder, more conservative pace' of business. For example, If we
cannot setup a 'patch/pre-beta' web-site locally at QLogic, I've
considered starting a SourceForge project or hosting it locally
through my ISP.

Regards,
Andrew Vasquez

2002-09-27 16:57:18

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Jeff Garzik [[email protected]] wrote:
> Wow, thanks for all that information, and it's great that you've
> integrated Arjan's work and feedback.
>
> There is one big question left unanswered... Where can the source for
> the latest version with all this wonderful stuff be found? :) I don't
> see a URL even for 6.01b5.

I case you already did not get the url.

http://download.qlogic.com/drivers/5642/qla2x00-v6.1b5-dist.tgz

-andmike
--
Michael Anderson
[email protected]

2002-09-27 17:01:30

by Mike Anderson

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Andrew Vasquez [[email protected]] wrote:
> In my mind, a larger question is determining a balance between the
> 'Release Early, release often' mantra of Linux development and the
> 'kinder, more conservative pace' of business. For example, If we
> cannot setup a 'patch/pre-beta' web-site locally at QLogic, I've
> considered starting a SourceForge project or hosting it locally
> through my ISP.

Currently the release method of not having patches against older
releases of the driver or even an archive of the full older release has
resulted in having others duplicate this functionality.

It would be great if you could set up a patch site.


-andmike
--
Michael Anderson
[email protected]

2002-09-30 08:10:08

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Thu, Sep 26 2002, Patrick Mansfield wrote:
> On Thu, Sep 26, 2002 at 08:59:51AM +0200, Jens Axboe wrote:
> > On Thu, Sep 26 2002, Jens Axboe wrote:
> > BTW, for SCSI, it would be nice to first convert more drivers to use the
> > block level queued tagging. That would provide us with a much better
> > means to control starvation properly on SCSI as well.
> >
> > --
> > Jens Axboe
>
> I haven't look closely at the block tagging, but for the FCP protocol,
> there are no tags, just the type of queueing to use (task attributes)
> - like ordered, head of queue, untagged, and some others. The tagging
> is normally done on the adapter itself (FCP2 protocol AFAIK). Does this
> mean block level queued tagging can't help FCP?

The generic block level tagging is nothing more than tag management. It
can 'tag' a request (assigning it an integer tag), and later let you
locate that request by giving it the tag.

I suspect you need none of that for FCP. Instead it looks more like you
can set the task attributes based on the type of request itself. So you
would currently set 'ordered' for a request with REQ_BARRIER set. And
you could set 'head of queue' for REQ_URGENT (I'm making this one up
:-), etc.

Do you need any request management to deal with FCP queueing? It doesn't
sound like it.

--
Jens Axboe

2002-09-30 08:05:54

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Fri, Sep 27 2002, Daniel Pittman wrote:
> >> This is a situation where, for a dedicated machine, delaying reads
> >> almost forever is actually a valuable thing. At least, valuable until
> >> it stops the writes from being able to proceed.
> >
> > Well 0 should achieve that quite fine
>
> Would you consider allowing something akin to 'writes_starved = -4' to
> allow writes to bypass reads only 4 times -- a preference for writes,
> but not forever?

Sure yes, that would be an acceptable solution.

--
Jens Axboe

2002-09-30 15:34:51

by Patrick Mansfield

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Mon, Sep 30, 2002 at 10:15:22AM +0200, Jens Axboe wrote:
> On Thu, Sep 26 2002, Patrick Mansfield wrote:

> > I haven't look closely at the block tagging, but for the FCP protocol,
> > there are no tags, just the type of queueing to use (task attributes)
> > - like ordered, head of queue, untagged, and some others. The tagging
> > is normally done on the adapter itself (FCP2 protocol AFAIK). Does this
> > mean block level queued tagging can't help FCP?
>
> The generic block level tagging is nothing more than tag management. It
> can 'tag' a request (assigning it an integer tag), and later let you
> locate that request by giving it the tag.
>
> I suspect you need none of that for FCP. Instead it looks more like you
> can set the task attributes based on the type of request itself. So you
> would currently set 'ordered' for a request with REQ_BARRIER set. And
> you could set 'head of queue' for REQ_URGENT (I'm making this one up
> :-), etc.
>
> Do you need any request management to deal with FCP queueing? It doesn't
> sound like it.

No.

OK I understand it now - if someone wants to put barrier support in an FCP
adapter driver something like we have in scsi_populate_tag_msg() would be
useful, an inline or macro like:

static inline int scsi_is_ordered(Scsi_Cmnd *SCpnt)
{
if (SCpnt->request->flags & REQ_BARRIER)
return 1;
else
return 0;
}

-- Patrick Mansfield

2002-09-30 16:02:47

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Mon, Sep 30 2002, Patrick Mansfield wrote:
> On Mon, Sep 30, 2002 at 10:15:22AM +0200, Jens Axboe wrote:
> > On Thu, Sep 26 2002, Patrick Mansfield wrote:
>
> > > I haven't look closely at the block tagging, but for the FCP protocol,
> > > there are no tags, just the type of queueing to use (task attributes)
> > > - like ordered, head of queue, untagged, and some others. The tagging
> > > is normally done on the adapter itself (FCP2 protocol AFAIK). Does this
> > > mean block level queued tagging can't help FCP?
> >
> > The generic block level tagging is nothing more than tag management. It
> > can 'tag' a request (assigning it an integer tag), and later let you
> > locate that request by giving it the tag.
> >
> > I suspect you need none of that for FCP. Instead it looks more like you
> > can set the task attributes based on the type of request itself. So you
> > would currently set 'ordered' for a request with REQ_BARRIER set. And
> > you could set 'head of queue' for REQ_URGENT (I'm making this one up
> > :-), etc.
> >
> > Do you need any request management to deal with FCP queueing? It doesn't
> > sound like it.
>
> No.
>
> OK I understand it now - if someone wants to put barrier support in an FCP
> adapter driver something like we have in scsi_populate_tag_msg() would be
> useful, an inline or macro like:
>
> static inline int scsi_is_ordered(Scsi_Cmnd *SCpnt)
> {
> if (SCpnt->request->flags & REQ_BARRIER)
> return 1;
> else
> return 0;
> }

Exactly

--
Jens Axboe

2002-10-01 22:32:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

Hi!

> Due to recent "problems" (well the vm being just too damn good at keep
> disks busy these days), it's become even more apparent that our current
> io scheduler just cannot cope with some work loads. Repeated starvartion
> of reads is the most important one. The Andrew Morton Interactive
> Workload (AMIW) [1] rates the current kernel poorly, on my test machine
> it completes in 1-2 minutes depending on your luck. 2.5.38-BK does a lot
> better, but mainly because it's being extremely unfair. This deadline io
> scheduler finishes the AMIW in anywhere from ~0.5 seconds to ~3-4
> seconds, depending on the io load.

would it be possible to make deadlines per-process to introduce ionice?

ionice -n -5 mpg123 foo.mp3
ionice make

? Pavel

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.

2002-10-02 05:29:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] deadline io scheduler

On Mon, Sep 30 2002, Pavel Machek wrote:
> Hi!
>
> > Due to recent "problems" (well the vm being just too damn good at keep
> > disks busy these days), it's become even more apparent that our current
> > io scheduler just cannot cope with some work loads. Repeated starvartion
> > of reads is the most important one. The Andrew Morton Interactive
> > Workload (AMIW) [1] rates the current kernel poorly, on my test machine
> > it completes in 1-2 minutes depending on your luck. 2.5.38-BK does a lot
> > better, but mainly because it's being extremely unfair. This deadline io
> > scheduler finishes the AMIW in anywhere from ~0.5 seconds to ~3-4
> > seconds, depending on the io load.
>
> would it be possible to make deadlines per-process to introduce ionice?
>
> ionice -n -5 mpg123 foo.mp3
> ionice make

Yes it would be possible, and at least for reads it doesn't require too
many changes to the deadline scheduler. There's even someone working on
it, expect something to play with soon. It bases the io priority on the
process nice levels.

--
Jens Axboe