2001-02-20 22:42:20

by Peter T. Breuer

[permalink] [raw]
Subject: plugging in 2.4. Does it work?

More like "how does one get it to work".

Does anyone have a simple recipe for doing plugging right in 2.4?
I'm doing something wrong.

When I disable plugging on my block driver (by registering a no-op
plugging function), the driver works fine. In particular my end_request
code works fine - it does an if end_that_request_first return;
end_that_request_last on the request to murder. Here it is

int my_end_request(struct request *req) {
unsigned long flags; int dequeue = 0;
spin_lock_irqsave(&io_request_lock, flags);
if (!req->errors) {
while (req->nr_sectors > 0) {
printk( KERN_DEBUG "running end_first on req with %d sectors\n",
req->nr_sectors);
if (!end_that_request_first (req, !req->errors, DEVICE_NAME))
break;
}
}
printk( KERN_DEBUG "running end_first on req with %d sectors\n",
req->nr_sectors);
if (!end_that_request_first (req, !req->errors, DEVICE_NAME)) {
printk( KERN_DEBUG "running end_last on req with %d sectors\n",
req->nr_sectors);
end_that_request_last(req);
dequeue = 1;
}
spin_unlock_irqrestore(&io_request_lock, flags);
return dequeue;
}

When I allow the kernel to use its default plugging function and
enable read-ahead of 20, I see read requests being aggregated
10-in-one with 1K blocks, but the userspace request hangs, or
the calling process dies! Or worse. Sometimes I get some
recordable logs .. and they show nothing wrong:

Feb 20 10:46:42 barney kernel: running end_first on req with 20 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 18 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 16 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 14 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 12 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 10 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 8 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 6 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 4 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 2 sectors
Feb 20 10:46:42 barney kernel: running end_first on req with 2 sectors
Feb 20 10:46:42 barney kernel: running end_last on req with 2 sectors
Feb 20 10:52:47 barney kernel: running end_first on req with 2 sectors
Feb 20 10:52:47 barney kernel: running end_first on req with 2 sectors
Feb 20 10:52:47 barney kernel: running end_last on req with 2 sectors

But death follows soomer rather than later.

I've discovered that

1) setting read-ahead to 0 disables request agregation by some means of
which I am not aware, and everything goes hunky dory.

2) setting read-ahead to 4 or 8 seems to be safe. I see 4K requests
being formed and treated OK.

3) disabling plugging stops request aggretaion and makes everything
safe.

Any clues? Is the trick just "powers of 2"? how is one supposed to
handle plugging? Where is the canonical example. I can't see any driver
that does it.

Peter


2001-02-20 22:54:41

by Jens Axboe

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

On Tue, Feb 20 2001, Peter T. Breuer wrote:
> More like "how does one get it to work".
>
> Does anyone have a simple recipe for doing plugging right in 2.4?
> I'm doing something wrong.
>
> When I disable plugging on my block driver (by registering a no-op
> plugging function), the driver works fine. In particular my end_request
> code works fine - it does an if end_that_request_first return;
> end_that_request_last on the request to murder. Here it is
>
> int my_end_request(struct request *req) {
> unsigned long flags; int dequeue = 0;
> spin_lock_irqsave(&io_request_lock, flags);
> if (!req->errors) {
> while (req->nr_sectors > 0) {
> printk( KERN_DEBUG "running end_first on req with %d sectors\n",
> req->nr_sectors);
> if (!end_that_request_first (req, !req->errors, DEVICE_NAME))
> break;
> }
> }
> printk( KERN_DEBUG "running end_first on req with %d sectors\n",
> req->nr_sectors);
> if (!end_that_request_first (req, !req->errors, DEVICE_NAME)) {
> printk( KERN_DEBUG "running end_last on req with %d sectors\n",
> req->nr_sectors);
> end_that_request_last(req);
> dequeue = 1;
> }
> spin_unlock_irqrestore(&io_request_lock, flags);
> return dequeue;
> }

Could this snippet possibly become more unreadable :-)
Firstly, I hope that the dequeue var does not return whether the
request should be dequeued or not. Because if you do it after
end_that_request_last, you are totally screwing the request
lists. Maybe this is what's going wrong?

> I've discovered that
>
> 1) setting read-ahead to 0 disables request agregation by some means of
> which I am not aware, and everything goes hunky dory.

Most likely what you are seeing happen is that we will do a
wait_on_buffer before we have a chance to merge this request on
the queue. Do writes, and you'll see lots of merging.

> 2) setting read-ahead to 4 or 8 seems to be safe. I see 4K requests
> being formed and treated OK.
>
> 3) disabling plugging stops request aggretaion and makes everything
> safe.
>
> Any clues? Is the trick just "powers of 2"? how is one supposed to
> handle plugging? Where is the canonical example. I can't see any driver
> that does it.

There's no trick, and no required values. And there's really no special
trick to handling clustered requests. Either you are doing scatter
gather and setup your sg tables by going through the complete buffer
list on the request, or you are just transferring to rq->buffer the
amount specified by current_nr_sectors. That's it. Really.

--
Jens Axboe

2001-02-20 22:58:53

by Jens Axboe

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

On Tue, Feb 20 2001, Peter T. Breuer wrote:
> int my_end_request(struct request *req) {
> unsigned long flags; int dequeue = 0;
> spin_lock_irqsave(&io_request_lock, flags);
> if (!req->errors) {
> while (req->nr_sectors > 0) {
> printk( KERN_DEBUG "running end_first on req with %d sectors\n",
> req->nr_sectors);
> if (!end_that_request_first (req, !req->errors, DEVICE_NAME))
> break;
> }
> }
> printk( KERN_DEBUG "running end_first on req with %d sectors\n",
> req->nr_sectors);
> if (!end_that_request_first (req, !req->errors, DEVICE_NAME)) {
> printk( KERN_DEBUG "running end_last on req with %d sectors\n",
> req->nr_sectors);
> end_that_request_last(req);
> dequeue = 1;
> }
> spin_unlock_irqrestore(&io_request_lock, flags);
> return dequeue;
> }

Forgot to mention that the above doesn't make much sense at all. If
there are no errors, you loop through ending all the buffers. Then
you fall through and end the the first (non-existant) chunk? And
end_that_request_first does not need to hold the io_request_lock,
you can move that down to protect end_that_request_last.

--
Jens Axboe

2001-02-20 23:28:11

by Peter T. Breuer

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

"A month of sundays ago Jens Axboe wrote:"
> On Tue, Feb 20 2001, Peter T. Breuer wrote:
> > More like "how does one get it to work".
[snip muddy end_request code]

Probably. I decided that accuracy might get a better response, though
I did have to expand the macros to get to this. It's really:

io_spin_lock
while (end_that_request_first(req ...);
// one more time for luck
if (!end_that_request_first(req ...)
end_that_request_last(req);
io_spin_unlock

> Firstly, I hope that the dequeue var does not return whether the
> request should be dequeued or not. Because if you do it after

Actually I ignore the return value at present. I just wanted to know what
happened. I haven't the faintest idea whether running end_that_request_last
MEANS anything.

> end_that_request_last, you are totally screwing the request
> lists. Maybe this is what's going wrong?

At the time that end_request is run, it's on my own queue, not on the
kernels queue. But I am eager for insight ... are you saying that
after end_that_request_last has run, all bets are off, because the
thing is completely vamooshed in every possible sense? I guess so.
But fear not ... I've already taken it off my queue too. I really
wanted the dequeue return value just in case maybe it would mean that
I'd have to put it back on my queue and have another attempt at
acking it.

> > I've discovered that
> >
> > 1) setting read-ahead to 0 disables request agregation by some means of
> > which I am not aware, and everything goes hunky dory.
>
> Most likely what you are seeing happen is that we will do a
> wait_on_buffer before we have a chance to merge this request on
> the queue. Do writes, and you'll see lots of merging.

OK ... that sounds like something to avoid for a while! Wait_on_buffer,
eh? If it makes things safe, I'm all for trying it myself!

> > 2) setting read-ahead to 4 or 8 seems to be safe. I see 4K requests
> > being formed and treated OK.
> >
> > 3) disabling plugging stops request aggretaion and makes everything
> > safe.
> >
> > Any clues? Is the trick just "powers of 2"? how is one supposed to
> > handle plugging? Where is the canonical example. I can't see any driver
> > that does it.
>
> There's no trick, and no required values. And there's really no special
> trick to handling clustered requests. Either you are doing scatter
> gather and setup your sg tables by going through the complete buffer
> list on the request, or you are just transferring to rq->buffer the
> amount specified by current_nr_sectors. That's it. Really.

Hurrr ... are you saying that the buffers in the bh's in the request are
not contiguous? My reading of the make_request code in 2.2 was that
they were! Has that changed? There is now a reference to an elevator
algorithm in the code, and I can't make out the effect by looking ...
I have been copying the buffer in the request as though it were a single
contigous whole. If that is not the case, then yes, bang would happen.

My aim in allowing request aggregation was to reduce the number
of ioctl calls I do from the userspace-half of the driver, since I
have to do one ioctl per request in the protocol. A P3 maxes out the
cpu with the driver at just about 300MB/s (cache speed) but I wanted to
go faster on other architectures.

I must apparently also call blk_init_queue, but yes I do.

Thanks for the reply Jens, I appreciate it.

Peter

2001-02-20 23:32:40

by Peter T. Breuer

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

"A month of sundays ago Jens Axboe wrote:"
> Forgot to mention that the above doesn't make much sense at all. If
> there are no errors, you loop through ending all the buffers. Then

Yes, that's right, thanks. I know I do one more end_that_request_first
than is necessary, but it is harmless as there is a guard in the
kernel code. At least, it's harmless until someone removes that guard.

It is like that because I added the while loop to the front of the code I
already had working when I decided to try plugging.

> you fall through and end the the first (non-existant) chunk? And
> end_that_request_first does not need to hold the io_request_lock,
> you can move that down to protect end_that_request_last.

OK, thanks!

Peter

2001-02-20 23:38:42

by Jens Axboe

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

On Wed, Feb 21 2001, Peter T. Breuer wrote:
> Actually I ignore the return value at present. I just wanted to know what
> happened. I haven't the faintest idea whether running end_that_request_last
> MEANS anything.

end_that_request_last most important job is up'ing any waiters on
the request (not usual), and put the now completed request back
on the queue free (or pending) list.

> > end_that_request_last, you are totally screwing the request
> > lists. Maybe this is what's going wrong?
>
> At the time that end_request is run, it's on my own queue, not on the
> kernels queue. But I am eager for insight ... are you saying that
> after end_that_request_last has run, all bets are off, because the
> thing is completely vamooshed in every possible sense? I guess so.
> But fear not ... I've already taken it off my queue too. I really
> wanted the dequeue return value just in case maybe it would mean that
> I'd have to put it back on my queue and have another attempt at
> acking it.

You cannot put it on different queues when you have run
end_that_request_last on it. For all you know, it may be a part
of a new I/O request as soon as you drop the io_request_lock.
That's why you have to dequeue prior to doing the final end_request, so
none of the lists the request may be on are destroyed.

> > > 1) setting read-ahead to 0 disables request agregation by some means of
> > > which I am not aware, and everything goes hunky dory.
> >
> > Most likely what you are seeing happen is that we will do a
> > wait_on_buffer before we have a chance to merge this request on
> > the queue. Do writes, and you'll see lots of merging.
>
> OK ... that sounds like something to avoid for a while! Wait_on_buffer,
> eh? If it makes things safe, I'm all for trying it myself!

When someones reads the data, that is. The only real reason (as you
discovered) that we get clustered reads is when a lot of read aheads
are queued. Or if the queue is already doing something else before
the request is started. This is not something you should do in your
driver, but it's worth while knowing about it so you can optimize
your driver for max performance.

> > There's no trick, and no required values. And there's really no special
> > trick to handling clustered requests. Either you are doing scatter
> > gather and setup your sg tables by going through the complete buffer
> > list on the request, or you are just transferring to rq->buffer the
> > amount specified by current_nr_sectors. That's it. Really.
>
> Hurrr ... are you saying that the buffers in the bh's in the request are
> not contiguous? My reading of the make_request code in 2.2 was that
> they were! Has that changed? There is now a reference to an elevator
> algorithm in the code, and I can't make out the effect by looking ...
> I have been copying the buffer in the request as though it were a single
> contigous whole. If that is not the case, then yes, bang would happen.

Nothing has changed in this regard at all between 2.2 and 2.4. The
buffers are guaranteed to be sequentially sector-wise, but definitely
not contigious in memory!

--
Jens Axboe

2001-02-20 23:48:42

by Peter T. Breuer

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

"A month of sundays ago Jens Axboe wrote:"
> On Wed, Feb 21 2001, Peter T. Breuer wrote:
> > Hurrr ... are you saying that the buffers in the bh's in the request are
> > not contiguous? My reading of the make_request code in 2.2 was that
> > they were! Has that changed? There is now a reference to an elevator
> > algorithm in the code, and I can't make out the effect by looking ...
> > I have been copying the buffer in the request as though it were a single
> > contigous whole. If that is not the case, then yes, bang would happen.
>
> Nothing has changed in this regard at all between 2.2 and 2.4. The
> buffers are guaranteed to be sequentially sector-wise, but definitely
> not contigious in memory!

I recall that in 2.2 the make_request code tested that the
buffers were contiguous in memory. From 2.2.18:

/* Can we add it to the end of this request? */
if (back) {
if (req->bhtail->b_data + req->bhtail->b_size
!= bh->b_data) {
if (req->nr_segments < max_segments)
req->nr_segments++;
else break;
}

It looks to me like it tested that the b_data char* pointers of the
two requests being considered are exactly distant by the declared
size of one.

Is that no longer the case? If so, that's my answer.


Peter

2001-02-20 23:52:52

by Jens Axboe

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

On Wed, Feb 21 2001, Peter T. Breuer wrote:
> I recall that in 2.2 the make_request code tested that the
> buffers were contiguous in memory. From 2.2.18:
>
> /* Can we add it to the end of this request? */
> if (back) {
> if (req->bhtail->b_data + req->bhtail->b_size
> != bh->b_data) {
> if (req->nr_segments < max_segments)
> req->nr_segments++;
> else break;
> }
>
> It looks to me like it tested that the b_data char* pointers of the
> two requests being considered are exactly distant by the declared
> size of one.
>
> Is that no longer the case? If so, that's my answer.

It will still cluster, the code above checks if the next bh is
contigious -- if it isn't, then check if we can grow another segment.
So you may be lucky that some buffer_heads in the chain are indeed
contiguous, that's what the segment count is for. This is exactly
the same in 2.4.

--
Jens Axboe

2001-02-21 15:37:04

by Peter T. Breuer

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?


"Jens Axboe wrote:"
> It will still cluster, the code above checks if the next bh is
> contigious -- if it isn't, then check if we can grow another segment.
> So you may be lucky that some buffer_heads in the chain are indeed
> contiguous, that's what the segment count is for. This is exactly
> the same in 2.4.

OK .. that fixed it. Turned out that I wasn't walking the request bh's
on read of clustered requests (but I was doing so on write).

Reads now work fine, but writes show signs of having some problem. I'll
beat on that later.

I'm particularly concerned about the error behaviour. How should I set
up the end_request code in the case when the request is to be errored?
Recall that my end_request code is presently like this:

io_spin_lock
while (end_that_request_first(req,!req->errors);
// one more time for luck
if (!end_that_request_first(req,!req->errors)
end_that_request_last(req);
io_spin_unlock

and I get the impression from other driver code snippets that a single
end_that_request_first is enough, but looking at the implementation it
can't be. It looks from ll_rw_blk that I should walk the bh chain just
the same in the case of error, no?

Peter

2001-02-21 17:29:12

by Jens Axboe

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

On Wed, Feb 21 2001, Peter T. Breuer wrote:
> I'm particularly concerned about the error behaviour. How should I set
> up the end_request code in the case when the request is to be errored?
> Recall that my end_request code is presently like this:
>
> io_spin_lock
> while (end_that_request_first(req,!req->errors);
> // one more time for luck
> if (!end_that_request_first(req,!req->errors)
> end_that_request_last(req);
> io_spin_unlock
>
> and I get the impression from other driver code snippets that a single
> end_that_request_first is enough, but looking at the implementation it
> can't be. It looks from ll_rw_blk that I should walk the bh chain just
> the same in the case of error, no?

The implementation in ll_rw_blk.c (and other places) assumes that
a failed request just means the first chunk and it then makes sense
to just end i/o on that buffer and resetup the request for the next
buffer. If you want to completely scrap the request on an error, then
you'll just have to do that manually (ie loop end_that_request_first
and end_that_request_last at the end).

void my_end_request(struct request *rq, int uptodate)
{
while (end_that_request_first(rq, uptodate))
;

io_lock
end_that_request_last(rq);
io_unlock
}

And why you keep insisting on a duplicate end_that_request_first I don't
know?!

--
Jens Axboe

2001-02-21 17:55:24

by Peter T. Breuer

[permalink] [raw]
Subject: Re: plugging in 2.4. Does it work?

"A month of sundays ago Jens Axboe wrote:"
> The implementation in ll_rw_blk.c (and other places) assumes that
> a failed request just means the first chunk and it then makes sense
> to just end i/o on that buffer and resetup the request for the next
> buffer. If you want to completely scrap the request on an error, then
> you'll just have to do that manually (ie loop end_that_request_first
> and end_that_request_last at the end).
>
> void my_end_request(struct request *rq, int uptodate)
> {
> while (end_that_request_first(rq, uptodate))
> ;
>
> io_lock
> end_that_request_last(rq);
> io_unlock
> }

OK, thanks!

> And why you keep insisting on a duplicate end_that_request_first I don't
> know?!

Backwards compatibility. The code has to keep working under older
kernel versions too in order to pass the regression tests, and the
simplest thing is to put the front while loop in an ifdef while I work
on it. At some stage soon I'll do as you want. Things have stopped
oopsing on write now, and I've sent it for testing.

That's also why I have the io_lock around the whole code instead of
only the last part. I have to put it around something that at least
existed in 2.2 and even 2.0. I've stepped the version number and will
be revising the code while its being tested. But thanks for the
concern :-). I appreciate it!

Peter