2002-03-13 13:05:49

by Roman Zippel

[permalink] [raw]
Subject: 2.5.6: ide driver broken in PIO mode

Hi,

I first noticed the problem on my Amiga, but I can reproduce it on an ia32
machine, when I turn off dma with hdparm. After a while the driver stops
working with:

hda: lost interrupt

After rebooting e2fsck has some serious damage to repair, so the driver
even writes garbage back to disk before stopping.
I can reproduce the problem pretty easily (some disk activity like from a
kernel compile is already enough) and it happens on two completely
different systems, so it should be reproducable on other systems too.

bye, Roman


2002-03-13 17:43:30

by Linus Torvalds

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
Roman Zippel <[email protected]> wrote:
>
>I first noticed the problem on my Amiga, but I can reproduce it on an ia32
>machine, when I turn off dma with hdparm.

With PIO, the current IDE/bio stuff doesn't like the write-multiple
interface and has bad interactions.

Jens, you talked about a patch from Supparna two weeks ago, any
progress?

Linus

2002-03-13 18:15:40

by Andre Hedrick

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

On Wed, 13 Mar 2002, Linus Torvalds wrote:

> In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
> Roman Zippel <[email protected]> wrote:
> >
> >I first noticed the problem on my Amiga, but I can reproduce it on an ia32
> >machine, when I turn off dma with hdparm.
>
> With PIO, the current IDE/bio stuff doesn't like the write-multiple
> interface and has bad interactions.
>
> Jens, you talked about a patch from Supparna two weeks ago, any
> progress?

Linus,

Suparna understands the problem and it is a solution I have described,
and I have been working with her on a solution but I suspect you will not
take it. Because it requires an in process operation of traversing
several BIOS' in order to statisfy the hardware atomic. Until you figure
out, you can not have the partial completion update to the granularity of
a single page or single bio it can not be fixed. You have to permit the
hardware to satisfy its needs and have an active in process list it will
never work.

So let me know when you want a solution that is correct and proper to the
hardward and then it will be fixed. If you continue to refuse the correct
and proper state diagram operation, you will continue to play fast and
loose with people's data.

The fact is a few people understand the problem and the solution.
By now I think you are just now getting the problem.

You can have your partial completions, but you are not going to have them
on your granularity scale, period.

If you want them on your granularity scale, you will carry the
responsiblity of "FAST and LOOSE with the DATA".

Regards,

Andre Hedrick

2002-03-13 20:08:57

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

Andre Hedrick wrote:
>
> ...
> Suparna understands the problem and it is a solution I have described,
> and I have been working with her on a solution but I suspect you will not
> take it. Because it requires an in process operation of traversing
> several BIOS' in order to statisfy the hardware atomic.

I think I understood the problem a couple of days ago. Let
me try to express it:

For a PIO-mode transfer, the driver needs to pump (say) 8192
bytes under CPU control into the host controller. Later, an
interrupt comes back from the device which says "that worked",
or "I/O error".

So clearly the driver has to not signal completion against those
8192 bytes until the interrupt comes in.

So the driver needs to hang on to those 8192 bytes.

And those 8192 bytes may be represented by two vectors in a single
BIO. That's OK. The problem occurs when those 8192 bytes are
encapsulated in *mutliple* BIOs in a single request. This is the
common case.

Apparently, the interface which is offered to the device driver
requires that the driver signal completion against the Nth BIO
before the driver can advance onto the N+1th BIO in the request.
As a consequence of this, the driver is accidentally signalling
completion against the Nth BIO *before* that interrupt has come
back. So we're telling userspace that the read or write was OK
before we actually know that this is true.

So it seems that the IDE driver currently has some non-working
workarounds in place for this problem.

Andre, is this an accurate description?

If so, then it would seem that either

a) Someone needs to tell Andre (in simple, storage-person terms :))
what the recommended workaround is or

b) The block layer needs to be extended so the driver can walk
all the request's segments prior to signalling completion
against any of them.

-

2002-03-13 20:34:51

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

On Wed, Mar 13 2002, Linus Torvalds wrote:
> In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
> Roman Zippel <[email protected]> wrote:
> >
> >I first noticed the problem on my Amiga, but I can reproduce it on an ia32
> >machine, when I turn off dma with hdparm.
>
> With PIO, the current IDE/bio stuff doesn't like the write-multiple
> interface and has bad interactions.
>
> Jens, you talked about a patch from Supparna two weeks ago, any
> progress?

I can fix this tomorrow, the stuff from Supparna was just a small bio
helper to retrieve the segments in a request. Just what we need for
write-multi.

--
Jens Axboe

2002-03-13 20:38:12

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

On Wed, Mar 13 2002, Andre Hedrick wrote:
> On Wed, 13 Mar 2002, Linus Torvalds wrote:
>
> > In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
> > Roman Zippel <[email protected]> wrote:
> > >
> > >I first noticed the problem on my Amiga, but I can reproduce it on an ia32
> > >machine, when I turn off dma with hdparm.
> >
> > With PIO, the current IDE/bio stuff doesn't like the write-multiple
> > interface and has bad interactions.
> >
> > Jens, you talked about a patch from Supparna two weeks ago, any
> > progress?
>
> Linus,
>
> Suparna understands the problem and it is a solution I have described,
> and I have been working with her on a solution but I suspect you will not
> take it. Because it requires an in process operation of traversing
> several BIOS' in order to statisfy the hardware atomic. Until you figure
> out, you can not have the partial completion update to the granularity of
> a single page or single bio it can not be fixed. You have to permit the
> hardware to satisfy its needs and have an active in process list it will
> never work.

Again, this is not needed, there are two ways at doing this. One is to
traverse the segments and setup the drive for the full transfer with
write (or read) multi. Drive will expect nr_sectors transfer and do it
xx sectors of the time as programmed by set multi. The other approach is
what I tried last time (but with the early-interrupt fixed), just
program the drive for no more than current_nr_sectors and simply let the
command request nr_sectors / current_nr_sectors times.

The latter approach also satisfies 'the hardware atomic' as you call it.

> So let me know when you want a solution that is correct and proper to the

Still all talk, I'm guessing.

> The fact is a few people understand the problem and the solution.
> By now I think you are just now getting the problem.

No the fact is that you _think_ only a few people understand the
problem.

--
Jens Axboe

2002-03-13 22:11:03

by Martin Dalecki

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

Jens Axboe wrote:
> On Wed, Mar 13 2002, Linus Torvalds wrote:
>
>>In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
>>Roman Zippel <[email protected]> wrote:
>>
>>>I first noticed the problem on my Amiga, but I can reproduce it on an ia32
>>>machine, when I turn off dma with hdparm.
>>>
>>With PIO, the current IDE/bio stuff doesn't like the write-multiple
>>interface and has bad interactions.
>>
>>Jens, you talked about a patch from Supparna two weeks ago, any
>>progress?
>>
>
> I can fix this tomorrow, the stuff from Supparna was just a small bio
> helper to retrieve the segments in a request. Just what we need for
> write-multi.

Yeep - I'm glad you are around :-).

2002-03-13 22:23:05

by Alan

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

> b) The block layer needs to be extended so the driver can walk
> all the request's segments prior to signalling completion
> against any of them.

This would be good - batching blocks by peeking down the queue is good
for raid cards which tend to want stripe sized chunks

2002-03-14 05:23:17

by Andre Hedrick

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

On Wed, 13 Mar 2002, Alan Cox wrote:

> > b) The block layer needs to be extended so the driver can walk
> > all the request's segments prior to signalling completion
> > against any of them.
>
> This would be good - batching blocks by peeking down the queue is good
> for raid cards which tend to want stripe sized chunks
>

Thank you Alan for the kind words.

Would you agree having the ablity to mark various BIOS' in process but not
retrun them back to the upper layers until the device driver can be
positive of a successful transfer?

The suggestion I put forward over many nights with the brillant Suparna,
was to create an in process marker on the a BIO handed to the driver.
She had enough guts to try out the idea but there still was a piece
missing. We determined the needed part is an second path the do the
update delays and another function to clear any hold on the BIO for
release back to the top layers.

This has been my request from the beginning of BIO but not in the exact
words. BIO does not address the granularity of the the hardware segment.
It does address all DMA transfers, but then again there are no partial
completions for the most part. DMA speed is vastly quicker than PIO and
it is not possible to do partial updates in any sane way.

So what I am gathering from Alan is other hardware would benefit from this
feature/requirement. Maybe now something will move forward.

Cheers,

Andre

2002-03-14 05:57:38

by Andre Hedrick

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode


Jens,

How about telling me this much since you are claiming an answer and
solution. Where exactly do you plan to determine whether the current
device data block is a success or failure? This is regardless of read or
write direction.

For the record the single sector transfers do not handle this case either.

Maybe if the two of you will provide some room to get work done, I can
finish what I started and get out of your way.

As it is clear I am being shown the door.

I will go soon enough but I owe it to the folks out there to finish
something I started which I have wait to see if anyone could finish, this
includes the newly appointed Martin.

Your answer is incomplete, Jens.

However, I will stand back and let you do it and then proceed in a lab
with analyzers to provide the traces of short comings.

Regards,

Andre

On Wed, 13 Mar 2002, Jens Axboe wrote:

> On Wed, Mar 13 2002, Andre Hedrick wrote:
> > On Wed, 13 Mar 2002, Linus Torvalds wrote:
> >
> > > In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
> > > Roman Zippel <[email protected]> wrote:
> > > >
> > > >I first noticed the problem on my Amiga, but I can reproduce it on an ia32
> > > >machine, when I turn off dma with hdparm.
> > >
> > > With PIO, the current IDE/bio stuff doesn't like the write-multiple
> > > interface and has bad interactions.
> > >
> > > Jens, you talked about a patch from Supparna two weeks ago, any
> > > progress?
> >
> > Linus,
> >
> > Suparna understands the problem and it is a solution I have described,
> > and I have been working with her on a solution but I suspect you will not
> > take it. Because it requires an in process operation of traversing
> > several BIOS' in order to statisfy the hardware atomic. Until you figure
> > out, you can not have the partial completion update to the granularity of
> > a single page or single bio it can not be fixed. You have to permit the
> > hardware to satisfy its needs and have an active in process list it will
> > never work.
>
> Again, this is not needed, there are two ways at doing this. One is to
> traverse the segments and setup the drive for the full transfer with
> write (or read) multi. Drive will expect nr_sectors transfer and do it
> xx sectors of the time as programmed by set multi. The other approach is
> what I tried last time (but with the early-interrupt fixed), just
> program the drive for no more than current_nr_sectors and simply let the
> command request nr_sectors / current_nr_sectors times.
>
> The latter approach also satisfies 'the hardware atomic' as you call it.
>
> > So let me know when you want a solution that is correct and proper to the
>
> Still all talk, I'm guessing.
>
> > The fact is a few people understand the problem and the solution.
> > By now I think you are just now getting the problem.
>
> No the fact is that you _think_ only a few people understand the
> problem.
>
> --
> Jens Axboe
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


2002-03-14 16:05:47

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

Today was one of those days when I didn't get time to look through
lkml, and almost missed this thread. Thanks Andre, for mentioning
it to me.

On Wed, Mar 13, 2002 at 09:34:08PM +0100, Jens Axboe wrote:
> On Wed, Mar 13 2002, Linus Torvalds wrote:
> > In article <Pine.LNX.4.21.0203131339050.26768-100000@serv>,
> > Roman Zippel <[email protected]> wrote:
> > >
> > >I first noticed the problem on my Amiga, but I can reproduce it on an ia32
> > >machine, when I turn off dma with hdparm.
> >
> > With PIO, the current IDE/bio stuff doesn't like the write-multiple
> > interface and has bad interactions.
> >
> > Jens, you talked about a patch from Supparna two weeks ago, any
> > progress?
>
> I can fix this tomorrow, the stuff from Supparna was just a small bio
> helper to retrieve the segments in a request. Just what we need for
> write-multi.
>

That's sort of it. It's just at bio/rq level - I haven't touched
IDE (don't have that expertise).

However, the latest code I have also covers the avoidance of bv_len,
bv_offset modifications by the block layer, which I'd been
concerned about for quite a while and ought to have done something about
much sooner ;) With this change one can safetly stick in a kvec in a
bio for example and possibly perform a copyout or something post i/o
using the same descriptors. (Ben, this is a FYI for you)

Anyway, here is the last patch I had sent to Andre (Jens, this is
more recent than the earlier helper snippets). It is based on 2.5.5
though, and I have only tried it on a SCSI system for regression
(no real testing, just tried booting the kernel and working with it).
Some changes are needed in IDE so it works with this (see rq_map_buffer
and comments later). So there's work to go.

I wanted to wait till Andre got to integrate this with IDE along
with his modifications and let me know if it really helps solve
the problem and addresses the requirements he's been highlighting
all this while. The interfaces need to be refined in accordance with
that feedback, and this patch is incomplete without those IDE
changes. But now that the discussion has caught on, thought I'd share it
as is anyway for feedback etc ... in the meantime.
(Caution: Don't try it as is on an IDE system right now, unless
you modify ide_map_buffer as discussed below ...)

Jens, could you check the comment on blk_rq_recalc_segments during
end_that_request_first ? I remember having discussed this with you
long back, but now I can't recall the situations where we need to
recalculate the segments for a request that is being processed.

Regards
Suparna

---------------------------------------------------------------


Description
-------------

(a) Have added a couple of fields (rq->hard_bio and bio->bi_hard_idx).
So now:

1.Fields rq->hard_bio, rq->hard_bio->bi_hard_idx, rq->hard_cur_sectors,
rq->hard_sector and rq->hard_nr_sectors are the ones which get updated
on true i/o completion, i.e. when end_that_request_first is called.
These constitute the state that indicates which parts of the request
are yet to finish.

2. Fields rq->bio, rq->bio->bi_idx, rq->current_nr_sectors, rq->sector
and rq->nr_sectors are the ones used for keeping track of i/o
submission (i.e. these consitute the state that indicates which parts
of the request are yet to be submitted).
These are the fields which a driver would use when processing requests.

3. end_that_request_first now operates on the *hard* values, but it
also takes care of bringing the submission pointers up-to-date if
they are still behind.
(So, if the submission pointers are ahead of the current completion
pointers, then end_that_request_first would leave them alone)

(b) No longer modify bv_len, bv_offset fields during i/o completion or
submission. Thus the mapping provided by bio_map_irq always reflects
the start of the segment.
(This affects ide_map_buffer. Now there is a new macro rq_map_buffer
which does this correctly. IDE may need to be updated accordingly)

We can find out our relative offset wrt the start of the segment
through the following computation:
bv_len - current_nr_sectors << 9


(c) New routines

Let me know if the comments or the naming doesn't describe the intent
correctly - this part needs some work. I haven't actually used them
in the code as yet, so do be careful if you try them out.
I know I need to refine this part of the code.

- blk_rq_next_segment (this name might be confusing as all it really
does is move the pointers to point to the start of the next segment
if we are already at the end of the last segment
- suggestions required for a better name !)
- process_that_request_first (don't really like it as it is right
now; just put it in to illustrate how blk_rq_next_segment
could be used - need to work out better and more efficient semantics
for this)
- rq_map_buffer (along the lines of what ide_map_buffer did)
- rq_unmap_buffer


Regards
Suparna

--------- Cut here (biotraverse.patch) -------------

diff -ur pure-255/drivers/block/ll_rw_blk.c /usr/src/linux255-bio/drivers/block/ll_rw_blk.c
--- pure-255/drivers/block/ll_rw_blk.c Mon Mar 4 20:55:30 2002
+++ /usr/src/linux255-bio/drivers/block/ll_rw_blk.c Wed Mar 6 14:09:06 2002
@@ -1550,16 +1550,30 @@
rq->nr_hw_segments = nr_hw_segs;
}

-inline void blk_recalc_rq_sectors(struct request *rq, int nsect)
+void blk_recalc_rq_sectors(struct request *rq, int nsect)
{
if (rq->flags & REQ_CMD) {
+ BIO_BUG_ON(rq->hard_cur_sectors < nsect);
+ BIO_BUG_ON(rq->hard_bio->bi_hard_idx >= rq->hard_bio->bi_vcnt);
rq->hard_sector += nsect;
rq->hard_nr_sectors -= nsect;
- rq->sector = rq->hard_sector;
- rq->nr_sectors = rq->hard_nr_sectors;
-
- rq->current_nr_sectors = bio_iovec(rq->bio)->bv_len >> 9;
- rq->hard_cur_sectors = rq->current_nr_sectors;
+ rq->hard_cur_sectors -= nsect;
+ if ((rq->hard_cur_sectors == 0) && rq->hard_nr_sectors) {
+ rq->hard_cur_sectors = bio_iovec_idx(
+ rq->hard_bio, rq->hard_bio->bi_hard_idx)->bv_len >> 9;
+ }
+
+ /* Move the i/o submission pointers ahead if required */
+ if ((rq->nr_sectors >= rq->hard_nr_sectors) &&
+ (rq->sector <= rq->hard_sector)){
+ rq->sector = rq->hard_sector;
+ rq->nr_sectors = rq->hard_nr_sectors;
+ rq->current_nr_sectors = rq->hard_cur_sectors;
+ rq->bio = rq->hard_bio;
+ rq->bio->bi_idx = rq->bio->bi_hard_idx;
+ rq->buffer = bio_data(rq->bio) +
+ bio_iovec(rq->bio)->bv_len - rq->current_nr_sectors;
+ }

/*
* if total number of sectors is less than the first segment
@@ -1592,17 +1606,19 @@
int nsect, total_nsect;
struct bio *bio;

+
req->errors = 0;
if (!uptodate)
printk("end_request: I/O error, dev %s, sector %lu\n",
kdevname(req->rq_dev), req->sector);

total_nsect = 0;
- while ((bio = req->bio)) {
- nsect = bio_iovec(bio)->bv_len >> 9;

- BIO_BUG_ON(bio_iovec(bio)->bv_len > bio->bi_size);
+ /* our starting point may be in the middle of a segment */
+ while ((bio = req->hard_bio)) {

+ nsect = req->hard_cur_sectors;
+ BIO_BUG_ON(nsect > bio->bi_size);
/*
* not a complete bvec done
*/
@@ -1610,38 +1626,43 @@
int residual = (nsect - nr_sectors) << 9;

bio->bi_size -= residual;
- bio_iovec(bio)->bv_offset += residual;
- bio_iovec(bio)->bv_len -= residual;
blk_recalc_rq_sectors(req, nr_sectors);
- blk_recalc_rq_segments(req);
+
+ /*
+ * Could we just do without recalc segments ?
+ *- Suparna
+ *blk_recalc_rq_segments(req);
+ */
return 1;
}

/*
* account transfer
*/
- bio->bi_size -= bio_iovec(bio)->bv_len;
- bio->bi_idx++;
+ bio->bi_size -= nsect << 9;
+ bio->bi_hard_idx++;

nr_sectors -= nsect;
total_nsect += nsect;

if (!bio->bi_size) {
- req->bio = bio->bi_next;
-
+ req->hard_bio = bio->bi_next;
bio_endio(bio, uptodate);
-
total_nsect = 0;
}

- if ((bio = req->bio)) {
+ if ((bio = req->hard_bio)) {
blk_recalc_rq_sectors(req, nsect);

/*
* end more in this run, or just return 'not-done'
*/
if (unlikely(nr_sectors <= 0)) {
- blk_recalc_rq_segments(req);
+ /*
+ * Could we just do without recalc segments ?
+ * - Suparna
+ * blk_recalc_rq_segments(req);
+ */
return 1;
}
}
diff -ur pure-255/include/linux/bio.h /usr/src/linux255-bio/include/linux/bio.h
--- pure-255/include/linux/bio.h Mon Mar 4 20:57:52 2002
+++ /usr/src/linux255-bio/include/linux/bio.h Mon Mar 4 20:18:54 2002
@@ -68,6 +68,7 @@

unsigned short bi_vcnt; /* how many bio_vec's */
unsigned short bi_idx; /* current index into bvl_vec */
+ unsigned short bi_hard_idx; /* next unfinished vec */

/* Number of segments in this BIO after
* physical address coalescing is performed.
diff -ur pure-255/include/linux/blk.h /usr/src/linux255-bio/include/linux/blk.h
--- pure-255/include/linux/blk.h Mon Mar 4 20:56:13 2002
+++ /usr/src/linux255-bio/include/linux/blk.h Tue Mar 5 17:31:19 2002
@@ -52,6 +52,7 @@
extern inline struct request *elv_next_request(request_queue_t *q)
{
struct request *rq;
+ struct bio *bio;

while ((rq = __elv_next_request(q))) {
rq->flags |= REQ_STARTED;
@@ -59,6 +60,11 @@
if (&rq->queuelist == q->last_merge)
q->last_merge = NULL;

+ /* Remember where we are to start with */
+ rq->hard_bio = rq->bio;
+ rq_for_each_bio(bio, rq)
+ bio->bi_hard_idx = bio->bi_idx;
+
if ((rq->flags & REQ_DONTPREP) || !q->prep_rq_fn)
break;

@@ -403,6 +409,35 @@
blkdev_dequeue_request(req);
end_that_request_last(req);
}
+
+/*
+ * May be used for processing bio's while submitting i/o without
+ * signalling completion. Fails if more data is requested than is
+ * available in the request in which case it doesn't advance any
+ * pointers.
+ * (Assumes a request is correctly set up. No sanity checks)
+ */
+extern inline int process_that_request_first(struct request *req,
+ int nr_sectors)
+{
+ if (req->nr_sectors < nr_sectors)
+ return 0;
+
+ req->nr_sectors -= nr_sectors;
+ req->sector += nr_sectors;
+ while (nr_sectors) {
+ if (req->current_nr_sectors >= nr_sectors ) {
+ req->current_nr_sectors -= nr_sectors;
+ nr_sectors = 0;
+ } else {
+ nr_sectors -= req->current_nr_sectors;
+ req->current_nr_sectors = 0;
+ }
+ blk_rq_next_segment(req);
+ }
+ return 1;
+}
+

#endif /* ! SCSI_BLK_MAJOR(MAJOR_NR) */
#endif /* LOCAL_END_REQUEST */
diff -ur pure-255/include/linux/blkdev.h /usr/src/linux255-bio/include/linux/blkdev.h
--- pure-255/include/linux/blkdev.h Mon Mar 4 20:56:21 2002
+++ /usr/src/linux255-bio/include/linux/blkdev.h Mon Mar 4 20:27:02 2002
@@ -59,7 +59,9 @@
void *special;
char *buffer;
struct completion *waiting;
- struct bio *bio, *biotail;
+ struct bio *bio; /* next bio to submit */
+ struct bio *biotail;
+ struct bio *hard_bio; /* next unfinished bio */
request_queue_t *q;
struct request_list *rl;
};
@@ -222,6 +224,53 @@
* scheduler -- see elv_next_request
*/
#define blk_queue_headactive(q, head_active)
+
+
+/*
+ * temporarily mapping a (possible) highmem bio for typically for PIO transfer
+ */
+
+/* offset with respect to start of the segment */
+#define blk_rq_offset(rq) (bio_iovec((rq)->bio)->bv_len - ((rq)->current_nr_sectors << 9))
+
+extern inline void *rq_map_buffer(struct request *rq, unsigned long *flags)
+{
+ return bio_kmap_irq(rq->bio, flags) + blk_rq_offset(rq);
+}
+
+extern inline void rq_unmap_buffer(char *buffer, unsigned long *flags)
+{
+ bio_kunmap_irq(buffer, flags);
+}
+
+
+/*
+ * Points to the next segment in the request if the current segment
+ * is complete. Leaves things unchanged if this segment is not over or
+ * if no more segments are left in this request.
+ * Meant to be used for bio traversal during i/o submission
+ * Does not effect any i/o completions, and does not touch any of the
+ * hard* values in the request or bio
+ * (Decrementing rq->nr_sectors and rq->current_nr_sectors as data is
+ * transferred is the caller's responsibility)
+ */
+extern inline void blk_rq_next_segment(struct request *rq)
+{
+ if (rq->current_nr_sectors <= 0) {
+ struct bio *bio = rq->bio;
+
+ if (bio->bi_idx < bio->bi_vcnt - 1) {
+ bio->bi_idx++;
+ } else {
+ bio = bio->bi_next;
+ }
+ if (bio) {
+ rq->bio = bio;
+ rq->current_nr_sectors = bio_iovec(bio)->bv_len >> 9;
+ }
+ }
+}
+

extern unsigned long blk_max_low_pfn, blk_max_pfn;


2002-03-14 18:44:37

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

Suparna Bhattacharya wrote:
>
> ...
> However, the latest code I have also covers the avoidance of bv_len,
> bv_offset modifications by the block layer, which I'd been
> concerned about for quite a while and ought to have done something about
> much sooner ;)

urgh. I didn't know there was a risk of this.

I'm using bv_offset and bv_len in the bi_end_io handler to work out
whether to unlock the final page in the multipage BIO.

That can probably be avoided, but it would be better if these
can be left alone, or at least, restored to their original value
before returning the BIO to whoever created it.

I'm also using bi_private, under the assumption that the ownership
rules for that are analogous to buffer_head.b_private. Is this
correct? Who owns bi_private?


-

2002-03-14 18:51:39

by Jens Axboe

[permalink] [raw]
Subject: Re: 2.5.6: ide driver broken in PIO mode

On Thu, Mar 14 2002, Andrew Morton wrote:
> Suparna Bhattacharya wrote:
> >
> > ...
> > However, the latest code I have also covers the avoidance of bv_len,
> > bv_offset modifications by the block layer, which I'd been
> > concerned about for quite a while and ought to have done something about
> > much sooner ;)
>
> urgh. I didn't know there was a risk of this.
>
> I'm using bv_offset and bv_len in the bi_end_io handler to work out
> whether to unlock the final page in the multipage BIO.
>
> That can probably be avoided, but it would be better if these
> can be left alone, or at least, restored to their original value
> before returning the BIO to whoever created it.

Suparna's addition will be added, so we maintain the same length and
offset throughout.

> I'm also using bi_private, under the assumption that the ownership
> rules for that are analogous to buffer_head.b_private. Is this
> correct? Who owns bi_private?

Same semantics as b_priate, so don't worry :)

--
Jens Axboe