Hi!
I was just investigating where bio->bi_idx gets modified in the kernel.
I found these lines in ide-disk.c in ide_multwrite (DMA off, TASKFILE_IO
off):
> if (++bio->bi_idx >= bio->bi_vcnt) {
> bio->bi_idx = 0;
> bio = bio->bi_next;
> }
(rq->bio also gets changed but it's protected by the scratch buffer)
I think changing the bi_idx here is dangerous because
end_that_request_first needs this value to be unchanged because it
tracks the progress of the bio processing and updates bi_idx itself.
And bio->bi_idx = 0 is probably wrong because the bio can be submitted
with bio->bi_idx > 0 (if the bio was splitted and there are clones that
share the bio_vec array, like raid or device-mapper code).
If it really needs to play with bi_idx itself care should be taken to
reset bi_idx to the original value, not to zero.
I wasn't able to trigger a problem though, I don't know why exactly,
perhaps there are paths in __end_that_request_first that are not
interested in bi_dx. I still think there is something wrong with it.
Christophe,
You have to walk either the active or scratch BIO to satitsfy the FSM for
completion of a "DATA_BLOCK". Where "DATA_BLOCK" is variable in lenght.
Also you may not update or acknowledge the return of any BIOS regardless
until the status of the "DATA_BLOCK" is known. Status is always checked
after the transfer but you are not permitted to check it then in pio
period (excluding soft-poll completions, unsupported in Linux). Only
after the next interrupt returns can you querry the status of the previous
"DATA_BLOCK" transfer. Then and only then can you leave the FSM to deal
with the needs of BLOCK.
Trust that Bartlomiej gets the point, I spent a long time making sure
somebody did before I burned out.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Thu, 1 Jan 2004, Christophe Saout wrote:
> Hi!
>
> I was just investigating where bio->bi_idx gets modified in the kernel.
>
> I found these lines in ide-disk.c in ide_multwrite (DMA off, TASKFILE_IO
> off):
>
> > if (++bio->bi_idx >= bio->bi_vcnt) {
> > bio->bi_idx = 0;
> > bio = bio->bi_next;
> > }
>
> (rq->bio also gets changed but it's protected by the scratch buffer)
>
> I think changing the bi_idx here is dangerous because
> end_that_request_first needs this value to be unchanged because it
> tracks the progress of the bio processing and updates bi_idx itself.
>
> And bio->bi_idx = 0 is probably wrong because the bio can be submitted
> with bio->bi_idx > 0 (if the bio was splitted and there are clones that
> share the bio_vec array, like raid or device-mapper code).
>
> If it really needs to play with bi_idx itself care should be taken to
> reset bi_idx to the original value, not to zero.
>
> I wasn't able to trigger a problem though, I don't know why exactly,
> perhaps there are paths in __end_that_request_first that are not
> interested in bi_dx. I still think there is something wrong with it.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Friday 02 January 2004 07:18, Christophe Saout wrote:
> Hi!
Hi,
> I was just investigating where bio->bi_idx gets modified in the kernel.
>
> I found these lines in ide-disk.c in ide_multwrite (DMA off, TASKFILE_IO
>
> off):
> > if (++bio->bi_idx >= bio->bi_vcnt) {
> > bio->bi_idx = 0;
> > bio = bio->bi_next;
> > }
>
> (rq->bio also gets changed but it's protected by the scratch buffer)
>
> I think changing the bi_idx here is dangerous because
> end_that_request_first needs this value to be unchanged because it
> tracks the progress of the bio processing and updates bi_idx itself.
This is not a problem here because ide_multwrite() walks rq->bio chain itself.
It also updates current_nr_sectors and hard_cur_sectors fields of drive->wrq.
> And bio->bi_idx = 0 is probably wrong because the bio can be submitted
> with bio->bi_idx > 0 (if the bio was splitted and there are clones that
> share the bio_vec array, like raid or device-mapper code).
>
> If it really needs to play with bi_idx itself care should be taken to
> reset bi_idx to the original value, not to zero.
RAID or device-mapper code doesn't seem to care about bio->bi_idx
value after bio has been submitted to the block layer, so the current
code is safe enough. Also there is no place to store original bi_idx.
> I wasn't able to trigger a problem though, I don't know why exactly,
> perhaps there are paths in __end_that_request_first that are not
> interested in bi_dx. I still think there is something wrong with it.
After finishing data transfer multwrite_intr() calls ide_end_request()
with rq->nr_sectors argument (where rq is hwgroup->rq not drive->wrq),
so only whole bios are completed. There are no partial completions
and code depending on bio->bi_idx inside __end_that_request_first()
is not executed.
The real (generic) problem is that atomic block segment for a block device
(in this case ATA disk) can be composed of bvecs, bios or bios+bvecs and
driver can obtain information about next bvec from block layer (from rq->bio)
only after previous bvec has been acknowledgment by end_that_request_first().
In situation when information about previously processed bios/bvecs is needed
(ie. error condition) this information is already lost.
There are 2 solutions for this problem:
- Use separate bio lists (rq->cbio) and temporary data
(rq->nr_cbio_segments and rq->nr_cbio_sectors) for submission/completion.
Please look at process_that_request_first() and its usage in TASKFILE code.
You are then required to do partial bio completion.
- Do not use struct request fields directly and store information needed by
driver in separate data structures
(ie. scatter-gather data stored by SCSI layer).
You are then not allowed (and shouldn't need) to do partial bio completions.
IDE multi-sector code uses first method because second one was too
invasive/risky to be done (required serious rewrite of IDE transport code).
I hope generic block transport layer in 2.7.x will make life simpler.
--bart
Am Fr, den 02.01.2004 schrieb Bartlomiej Zolnierkiewicz um 02:27:
> > I was just investigating where bio->bi_idx gets modified in the kernel.
> >
> > I found these lines in ide-disk.c in ide_multwrite (DMA off, TASKFILE_IO
> >
> > off):
> > > if (++bio->bi_idx >= bio->bi_vcnt) {
> > > bio->bi_idx = 0;
> > > bio = bio->bi_next;
> > > }
> >
> > (rq->bio also gets changed but it's protected by the scratch buffer)
> >
> > I think changing the bi_idx here is dangerous because
> > end_that_request_first needs this value to be unchanged because it
> > tracks the progress of the bio processing and updates bi_idx itself.
>
> This is not a problem here because ide_multwrite() walks rq->bio chain itself.
> It also updates current_nr_sectors and hard_cur_sectors fields of drive->wrq.
Yes, I've seen this. That looks okay.
> > And bio->bi_idx = 0 is probably wrong because the bio can be submitted
> > with bio->bi_idx > 0 (if the bio was splitted and there are clones that
> > share the bio_vec array, like raid or device-mapper code).
> >
> > If it really needs to play with bi_idx itself care should be taken to
> > reset bi_idx to the original value, not to zero.
>
> RAID or device-mapper code doesn't seem to care about bio->bi_idx
> value after bio has been submitted to the block layer, so the current
> code is safe enough.
Yes, that's right. But I'd like to. I see that the code works this way,
but still it's somewhat incorrect and I'll run into trouble if I want to
do a certain thing. Well, you could simply say "then don't do it", but
hey. ;)
I'm working on a dm encryption target. So I need to allocate and manager
buffers. Under memory pressure (or if dm decided before thta) it can
happen that a bio is split up. But then to avoid deadlocks due to memory
shortage I need to free my buffers up as soon as possible. If a bio
returns (it doesn't even need to be a partial completion) I need to know
which pages I can free.
The way I would prefer is that when someone calls bio_endio the bi_idx
and bv_offset just point where the processed data begins.
Most drivers complete a bio at once and leave bi_idx where it was.
That's fine. With a very small modification end_that_request_first can
also follow the rule that I just outlined.
I implemented the buffer free mechanism this way and it works fine. I
added a lot of debug to make sure all pages get freed correctly and
funny enough everything works fine, I'm not even able to trigger
problems where I expect them. But I still don't trust this.
> Also there is no place to store original bi_idx.
That seems to be the key problem, sometimes. The other thing I could do
is to use bi_vcnt and bi_size and then go backwards through the bvecs to
find the pages and ignore the whole bi_idx issue. But this is ugly.
> After finishing data transfer multwrite_intr() calls ide_end_request()
> with rq->nr_sectors argument (where rq is hwgroup->rq not drive->wrq),
> so only whole bios are completed. There are no partial completions
> and code depending on bio->bi_idx inside __end_that_request_first()
> is not executed.
Yes, I suspected this. This part hardly seems to be ever used.
> The real (generic) problem is that atomic block segment for a block device
> (in this case ATA disk) can be composed of bvecs, bios or bios+bvecs and
> driver can obtain information about next bvec from block layer (from rq->bio)
> only after previous bvec has been acknowledgment by end_that_request_first().
Does it? end_that_request_first can deal with nr_bytes that span bvecs
and even bios. The only thing the driver has to do then is to walk the
bvecs and bios itself, what it is already doing, but it should do this
without modifying the indexes. Since it is working on a copy of the
request, the bio pointer doesn't move. But the bvec index does. It is
set to zero after a bio is finished, which is where it most probably was
at the beginning, but might not be.
I know, end_that_request_first doesn't care in this case, and it can't
be called for every bvec because the transfer only ends after the drive
acknowledged it (everything else would be wrong), but still.
Can't another (some local) variable be used as bvec index instead of
bi_idx in the original bio? (except from ide_map_buffer using exactly
this index...)
Still, I see, mcount could go to zero before the bio is finished and we
would need to store the bvec index somewhere, I see the problem.
What about doing a partial bio completion in multwrite_intr? If there is
data left you know you've finished multcount sectors, right?
> In situation when information about previously processed bios/bvecs is needed
> (ie. error condition) this information is already lost.
Sure.
> There are 2 solutions for this problem:
>
> - Use separate bio lists (rq->cbio) and temporary data
> (rq->nr_cbio_segments and rq->nr_cbio_sectors) for submission/completion.
That would be somewhat similar to what I just proposed, right?
Would you be interested in a small patch (well, if I can come up with
one)?
> Please look at process_that_request_first() and its usage in TASKFILE code.
I'll do. I already noticed that it used the other fields and obviously
doesn't use bi_idx the same way.
> You are then required to do partial bio completion.
Yes.
> - Do not use struct request fields directly and store information needed by
> driver in separate data structures
> (ie. scatter-gather data stored by SCSI layer).
>
> You are then not allowed (and shouldn't need) to do partial bio completions.
>
> IDE multi-sector code uses first method because second one was too
> invasive/risky to be done (required serious rewrite of IDE transport code).
I can understand that. The IDE layer seems to somewhat somewhat be a
victim of adding new features and adapting it to new layers.
> I hope generic block transport layer in 2.7.x will make life simpler.
Well, I hope so. And I hope the ide devices don't end up being treated
as scsi devices. ;)
Christophe,
I am sorry but adding in a splitter to CPRM is not acceptable.
Digital Rights Management in the kernel is not acceptable to me, period.
Maybe I have misread your intent and the contents on your website.
Device-Mappers are one thing, intercepting buffers in the taskfile FSM
transport is another. This stinks of CPRM at this level, regardless of
your intent. Do correct me if I am wrong.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 2 Jan 2004, Christophe Saout wrote:
> Am Fr, den 02.01.2004 schrieb Bartlomiej Zolnierkiewicz um 02:27:
>
> > > I was just investigating where bio->bi_idx gets modified in the kernel.
> > >
> > > I found these lines in ide-disk.c in ide_multwrite (DMA off, TASKFILE_IO
> > >
> > > off):
> > > > if (++bio->bi_idx >= bio->bi_vcnt) {
> > > > bio->bi_idx = 0;
> > > > bio = bio->bi_next;
> > > > }
> > >
> > > (rq->bio also gets changed but it's protected by the scratch buffer)
> > >
> > > I think changing the bi_idx here is dangerous because
> > > end_that_request_first needs this value to be unchanged because it
> > > tracks the progress of the bio processing and updates bi_idx itself.
> >
> > This is not a problem here because ide_multwrite() walks rq->bio chain itself.
> > It also updates current_nr_sectors and hard_cur_sectors fields of drive->wrq.
>
> Yes, I've seen this. That looks okay.
>
> > > And bio->bi_idx = 0 is probably wrong because the bio can be submitted
> > > with bio->bi_idx > 0 (if the bio was splitted and there are clones that
> > > share the bio_vec array, like raid or device-mapper code).
> > >
> > > If it really needs to play with bi_idx itself care should be taken to
> > > reset bi_idx to the original value, not to zero.
> >
> > RAID or device-mapper code doesn't seem to care about bio->bi_idx
> > value after bio has been submitted to the block layer, so the current
> > code is safe enough.
>
> Yes, that's right. But I'd like to. I see that the code works this way,
> but still it's somewhat incorrect and I'll run into trouble if I want to
> do a certain thing. Well, you could simply say "then don't do it", but
> hey. ;)
>
> I'm working on a dm encryption target. So I need to allocate and manager
> buffers. Under memory pressure (or if dm decided before thta) it can
> happen that a bio is split up. But then to avoid deadlocks due to memory
> shortage I need to free my buffers up as soon as possible. If a bio
> returns (it doesn't even need to be a partial completion) I need to know
> which pages I can free.
>
> The way I would prefer is that when someone calls bio_endio the bi_idx
> and bv_offset just point where the processed data begins.
>
> Most drivers complete a bio at once and leave bi_idx where it was.
> That's fine. With a very small modification end_that_request_first can
> also follow the rule that I just outlined.
>
> I implemented the buffer free mechanism this way and it works fine. I
> added a lot of debug to make sure all pages get freed correctly and
> funny enough everything works fine, I'm not even able to trigger
> problems where I expect them. But I still don't trust this.
>
> > Also there is no place to store original bi_idx.
>
> That seems to be the key problem, sometimes. The other thing I could do
> is to use bi_vcnt and bi_size and then go backwards through the bvecs to
> find the pages and ignore the whole bi_idx issue. But this is ugly.
>
> > After finishing data transfer multwrite_intr() calls ide_end_request()
> > with rq->nr_sectors argument (where rq is hwgroup->rq not drive->wrq),
> > so only whole bios are completed. There are no partial completions
> > and code depending on bio->bi_idx inside __end_that_request_first()
> > is not executed.
>
> Yes, I suspected this. This part hardly seems to be ever used.
>
> > The real (generic) problem is that atomic block segment for a block device
> > (in this case ATA disk) can be composed of bvecs, bios or bios+bvecs and
> > driver can obtain information about next bvec from block layer (from rq->bio)
> > only after previous bvec has been acknowledgment by end_that_request_first().
>
> Does it? end_that_request_first can deal with nr_bytes that span bvecs
> and even bios. The only thing the driver has to do then is to walk the
> bvecs and bios itself, what it is already doing, but it should do this
> without modifying the indexes. Since it is working on a copy of the
> request, the bio pointer doesn't move. But the bvec index does. It is
> set to zero after a bio is finished, which is where it most probably was
> at the beginning, but might not be.
>
> I know, end_that_request_first doesn't care in this case, and it can't
> be called for every bvec because the transfer only ends after the drive
> acknowledged it (everything else would be wrong), but still.
>
> Can't another (some local) variable be used as bvec index instead of
> bi_idx in the original bio? (except from ide_map_buffer using exactly
> this index...)
>
> Still, I see, mcount could go to zero before the bio is finished and we
> would need to store the bvec index somewhere, I see the problem.
>
> What about doing a partial bio completion in multwrite_intr? If there is
> data left you know you've finished multcount sectors, right?
>
> > In situation when information about previously processed bios/bvecs is needed
> > (ie. error condition) this information is already lost.
>
> Sure.
>
> > There are 2 solutions for this problem:
> >
> > - Use separate bio lists (rq->cbio) and temporary data
> > (rq->nr_cbio_segments and rq->nr_cbio_sectors) for submission/completion.
>
> That would be somewhat similar to what I just proposed, right?
>
> Would you be interested in a small patch (well, if I can come up with
> one)?
>
> > Please look at process_that_request_first() and its usage in TASKFILE code.
>
> I'll do. I already noticed that it used the other fields and obviously
> doesn't use bi_idx the same way.
>
> > You are then required to do partial bio completion.
>
> Yes.
>
> > - Do not use struct request fields directly and store information needed by
> > driver in separate data structures
> > (ie. scatter-gather data stored by SCSI layer).
> >
> > You are then not allowed (and shouldn't need) to do partial bio completions.
> >
> > IDE multi-sector code uses first method because second one was too
> > invasive/risky to be done (required serious rewrite of IDE transport code).
>
> I can understand that. The IDE layer seems to somewhat somewhat be a
> victim of adding new features and adapting it to new layers.
>
> > I hope generic block transport layer in 2.7.x will make life simpler.
>
> Well, I hope so. And I hope the ide devices don't end up being treated
> as scsi devices. ;)
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Thu, Jan 01 2004, Andre Hedrick wrote:
>
> Christophe,
>
> I am sorry but adding in a splitter to CPRM is not acceptable.
> Digital Rights Management in the kernel is not acceptable to me, period.
>
> Maybe I have misread your intent and the contents on your website.
>
> Device-Mappers are one thing, intercepting buffers in the taskfile FSM
> transport is another. This stinks of CPRM at this level, regardless of
> your intent. Do correct me if I am wrong.
0 2 4 6 8 10
/
/
/
/
/
/
/
PARANOIA-METER
--
Jens Axboe
Am Fr, den 02.01.2004 schrieb Andre Hedrick um 05:43:
> I am sorry but adding in a splitter to CPRM is not acceptable.
> Digital Rights Management in the kernel is not acceptable to me, period.
>
> Maybe I have misread your intent and the contents on your website.
>
> Device-Mappers are one thing, intercepting buffers in the taskfile FSM
> transport is another. This stinks of CPRM at this level, regardless of
> your intent. Do correct me if I am wrong.
I can assure you I was never having DRM or anything like this in mind
nor making fundamental changes to the IDE layer. It was just that
++bi_idx that bugged me. Must be a misunderstanding, sorry. :)
The only thing I'm having on my website is a device-mapper target that
does basically the same as cryptoloop tries to. It's just about
encrypting sensitive data on top of any other device, nothing else.
Jens,
Cute, so when will it be Jens Axboe <[email protected]>.
When I ask for your opinion I will reach around and wipe for it :-)
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 2 Jan 2004, Jens Axboe wrote:
> On Thu, Jan 01 2004, Andre Hedrick wrote:
> >
> > Christophe,
> >
> > I am sorry but adding in a splitter to CPRM is not acceptable.
> > Digital Rights Management in the kernel is not acceptable to me, period.
> >
> > Maybe I have misread your intent and the contents on your website.
> >
> > Device-Mappers are one thing, intercepting buffers in the taskfile FSM
> > transport is another. This stinks of CPRM at this level, regardless of
> > your intent. Do correct me if I am wrong.
>
> 0 2 4 6 8 10
> /
> /
> /
> /
> /
> /
> /
> PARANOIA-METER
>
> --
> Jens Axboe
>
Christophe,
Fair enough, and point taken. Thanks for the clarification.
I am puzzled by the need to modify buffers on the fly inside the FSM.
This is straight out of some unpublished information, so it struck a raw
nerve. Just be aware this tends to follow the design and could be used
for such.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Fri, 2 Jan 2004, Christophe Saout wrote:
> Am Fr, den 02.01.2004 schrieb Andre Hedrick um 05:43:
>
> > I am sorry but adding in a splitter to CPRM is not acceptable.
> > Digital Rights Management in the kernel is not acceptable to me, period.
> >
> > Maybe I have misread your intent and the contents on your website.
> >
> > Device-Mappers are one thing, intercepting buffers in the taskfile FSM
> > transport is another. This stinks of CPRM at this level, regardless of
> > your intent. Do correct me if I am wrong.
>
> I can assure you I was never having DRM or anything like this in mind
> nor making fundamental changes to the IDE layer. It was just that
> ++bi_idx that bugged me. Must be a misunderstanding, sorry. :)
>
> The only thing I'm having on my website is a device-mapper target that
> does basically the same as cryptoloop tries to. It's just about
> encrypting sensitive data on top of any other device, nothing else.
>
>
On Fri, Jan 02 2004, Andre Hedrick wrote:
>
> Jens,
>
> Cute, so when will it be Jens Axboe <[email protected]>.
>
> When I ask for your opinion I will reach around and wipe for it :-)
That's a problem you can solve yourself - if you'd stop venting your
repetitive, unsolicited, and endlessly tiresome paranoia theories in
public, I wouldn't have anything to comment on, would I? :)
--
Jens Axboe
Jens,
Would that need bleach, detergent, or softener?
You should try the mirror as my original concerns were addressed to
Christophe, and there are to many letters in his name for even you to
confuse spellings. :-)
Cheers,
Andre Hedrick
LAD Storage Consulting Group
On Sat, 3 Jan 2004, Jens Axboe wrote:
> On Fri, Jan 02 2004, Andre Hedrick wrote:
> >
> > Jens,
> >
> > Cute, so when will it be Jens Axboe <[email protected]>.
> >
> > When I ask for your opinion I will reach around and wipe for it :-)
>
> That's a problem you can solve yourself - if you'd stop venting your
> repetitive, unsolicited, and endlessly tiresome paranoia theories in
> public, I wouldn't have anything to comment on, would I? :)
>
> --
> Jens Axboe
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On Friday 02 of January 2004 04:20, Christophe Saout wrote:
> Am Fr, den 02.01.2004 schrieb Bartlomiej Zolnierkiewicz um 02:27:
> > > I was just investigating where bio->bi_idx gets modified in the kernel.
> > >
> > > I found these lines in ide-disk.c in ide_multwrite (DMA off,
> > > TASKFILE_IO
> > >
> > > off):
> > > > if (++bio->bi_idx >= bio->bi_vcnt) {
> > > > bio->bi_idx = 0;
> > > > bio = bio->bi_next;
> > > > }
> > >
> > > (rq->bio also gets changed but it's protected by the scratch buffer)
> > >
> > > I think changing the bi_idx here is dangerous because
> > > end_that_request_first needs this value to be unchanged because it
> > > tracks the progress of the bio processing and updates bi_idx itself.
> >
> > This is not a problem here because ide_multwrite() walks rq->bio chain
> > itself. It also updates current_nr_sectors and hard_cur_sectors fields of
> > drive->wrq.
>
> Yes, I've seen this. That looks okay.
>
> > > And bio->bi_idx = 0 is probably wrong because the bio can be submitted
> > > with bio->bi_idx > 0 (if the bio was splitted and there are clones that
> > > share the bio_vec array, like raid or device-mapper code).
> > >
> > > If it really needs to play with bi_idx itself care should be taken to
> > > reset bi_idx to the original value, not to zero.
> >
> > RAID or device-mapper code doesn't seem to care about bio->bi_idx
> > value after bio has been submitted to the block layer, so the current
> > code is safe enough.
>
> Yes, that's right. But I'd like to. I see that the code works this way,
> but still it's somewhat incorrect and I'll run into trouble if I want to
> do a certain thing. Well, you could simply say "then don't do it", but
> hey. ;)
>
> I'm working on a dm encryption target. So I need to allocate and manager
> buffers. Under memory pressure (or if dm decided before thta) it can
> happen that a bio is split up. But then to avoid deadlocks due to memory
> shortage I need to free my buffers up as soon as possible. If a bio
> returns (it doesn't even need to be a partial completion) I need to know
> which pages I can free.
>
> The way I would prefer is that when someone calls bio_endio the bi_idx
> and bv_offset just point where the processed data begins.
Are you aware that this will make partial completions illegal?
[ No problem for me. ]
> Most drivers complete a bio at once and leave bi_idx where it was.
> That's fine. With a very small modification end_that_request_first can
> also follow the rule that I just outlined.
>
> I implemented the buffer free mechanism this way and it works fine. I
> added a lot of debug to make sure all pages get freed correctly and
> funny enough everything works fine, I'm not even able to trigger
> problems where I expect them. But I still don't trust this.
>
> > Also there is no place to store original bi_idx.
>
> That seems to be the key problem, sometimes. The other thing I could do
> is to use bi_vcnt and bi_size and then go backwards through the bvecs to
> find the pages and ignore the whole bi_idx issue. But this is ugly.
>
> > After finishing data transfer multwrite_intr() calls ide_end_request()
> > with rq->nr_sectors argument (where rq is hwgroup->rq not drive->wrq),
> > so only whole bios are completed. There are no partial completions
> > and code depending on bio->bi_idx inside __end_that_request_first()
> > is not executed.
>
> Yes, I suspected this. This part hardly seems to be ever used.
>
> > The real (generic) problem is that atomic block segment for a block
> > device (in this case ATA disk) can be composed of bvecs, bios or
> > bios+bvecs and driver can obtain information about next bvec from block
> > layer (from rq->bio) only after previous bvec has been acknowledgment by
> > end_that_request_first().
>
> Does it? end_that_request_first can deal with nr_bytes that span bvecs
> and even bios. The only thing the driver has to do then is to walk the
> bvecs and bios itself, what it is already doing, but it should do this
> without modifying the indexes. Since it is working on a copy of the
> request, the bio pointer doesn't move. But the bvec index does. It is
> set to zero after a bio is finished, which is where it most probably was
> at the beginning, but might not be.
>
> I know, end_that_request_first doesn't care in this case, and it can't
> be called for every bvec because the transfer only ends after the drive
> acknowledged it (everything else would be wrong), but still.
>
> Can't another (some local) variable be used as bvec index instead of
> bi_idx in the original bio? (except from ide_map_buffer using exactly
> this index...)
see rq_map_buffer() in include/linux/blkdev.h
> Still, I see, mcount could go to zero before the bio is finished and we
> would need to store the bvec index somewhere, I see the problem.
bvec index and offset
> What about doing a partial bio completion in multwrite_intr? If there is
> data left you know you've finished multcount sectors, right?
Not always, ie. no. of sectors equal to no. of multicount sectors.
> > In situation when information about previously processed bios/bvecs is
> > needed (ie. error condition) this information is already lost.
>
> Sure.
>
> > There are 2 solutions for this problem:
> >
> > - Use separate bio lists (rq->cbio) and temporary data
> > (rq->nr_cbio_segments and rq->nr_cbio_sectors) for
> > submission/completion.
>
> That would be somewhat similar to what I just proposed, right?
Right, rq->nr_cbio_segments holds number of bvecs still to be processed
(no need to change bio->bi_idx) and rq->nr_cbio_sectors number of sectors
in the bio still to be proccessed (so rq->current_nr_sectors can be number
of sectors still to do in the current bvec).
Please note that this method doesn't require copy of struct request
(using scratch request copy is quite expensive).
> Would you be interested in a small patch (well, if I can come up with
> one)?
Sure, but I don't know what you want to change... :-)
> > Please look at process_that_request_first() and its usage in TASKFILE
> > code.
>
> I'll do. I already noticed that it used the other fields and obviously
> doesn't use bi_idx the same way.
>
> > You are then required to do partial bio completion.
>
> Yes.
Actually no, my mistake... s/required/allowed/
IDE taskfile code doesn't use partial completions.
--bart
On Sat, Jan 03 2004, Andre Hedrick wrote:
>
> Jens,
>
> Would that need bleach, detergent, or softener?
> You should try the mirror as my original concerns were addressed to
Your 'original concerns' were not original, they are the same crap you
spout any chance you get. I don't even remember when I last saw an email
from you with actual content. Maybe I should just get it over and done
with and finally black list your email.
> Christophe, and there are to many letters in his name for even you to
> confuse spellings. :-)
Midly amusing, you talking about spelling.
I dunno why I keep getting suckered into this, it's obvious you're
looking for these fights. I'll stop taking the bait now.
--
Jens Axboe
Am Sa, den 03.01.2004 schrieb Bartlomiej Zolnierkiewicz um 23:02:
> > The way I would prefer is that when someone calls bio_endio the bi_idx
> > and bv_offset just point where the processed data begins.
>
> Are you aware that this will make partial completions illegal?
> [ No problem for me. ]
Why that? __end_that_request_first already does this (when moving thw
two lines updating bv_offset/bv_len after the call of the bi_end_io
function).
> > Can't another (some local) variable be used as bvec index instead of
> > bi_idx in the original bio? (except from ide_map_buffer using exactly
> > this index...)
>
> see rq_map_buffer() in include/linux/blkdev.h
Right. I've been going through ide-taskfile.c for the last hours.
The IDE_TASKFILE_IO gets things right (from my point of view) and is
also much cleaner. (I would personally vote for dropping the non
TASKFILE_IO code, it would make my problem go away :D - why is it still
marked as experimental BTW? I've been using it since it was introduced,
without any problems)
BTW: The taskfile code that is used when IDE_TASKFILE_IO is disabled
might partially end requests without knowing the actual status, right?
> /*
> * FIXME :: We really can not legally get a new page/bh
> * regardless, if this is the end of our segment.
> * BH walking or segment can only be updated after we
> * have a good hwif->INB(IDE_STATUS_REG); return.
> */
> if (!rq->current_nr_sectors) {
> if (!DRIVER(drive)->end_request(drive, 1, 0))
> if (!rq->bio)
> return ide_stopped;
> }
> } while (msect);
Well, there's a FIXME so you know this is not legal, but to make sure.
In ide-disk.c you're walking the segments yourself using the original
bi_idx which avoids this problem but which is my original problem. And
TASKFILE_IO gets things right (from my point of view) and doesn't do
illegal things because it uses the "generic driver walking code" using
cbio/process_that_request_first and co.
So non TASKFILE_IO code has two multout codepaths (taskfile and not)
that are both "awkward" while TASKFILE_IO merges both into a single and
clean version.
> > Still, I see, mcount could go to zero before the bio is finished and we
> > would need to store the bvec index somewhere, I see the problem.
>
> bvec index and offset
Exactly.
> > What about doing a partial bio completion in multwrite_intr? If there is
> > data left you know you've finished multcount sectors, right?
>
> Not always, ie. no. of sectors equal to no. of multicount sectors.
Yes, I didn't think about this one.
> > > There are 2 solutions for this problem:
> > >
> > > - Use separate bio lists (rq->cbio) and temporary data
> > > (rq->nr_cbio_segments and rq->nr_cbio_sectors) for
> > > submission/completion.
> >
> > That would be somewhat similar to what I just proposed, right?
>
> Right, rq->nr_cbio_segments holds number of bvecs still to be processed
> (no need to change bio->bi_idx) and rq->nr_cbio_sectors number of sectors
> in the bio still to be proccessed (so rq->current_nr_sectors can be number
> of sectors still to do in the current bvec).
>
> Please note that this method doesn't require copy of struct request
> (using scratch request copy is quite expensive).
Yes. There's a memcpy commented out (#if 0) in ide-taskfile.c which you
don't ned because you "illegaly" let end_request (and so
end_that_request_first) to walk the request for you.
Using the cbio & co. mechanism you can let process_that_request_first
walk the code for you ("legally") without needing the copy either.
> > Would you be interested in a small patch (well, if I can come up with
> > one)?
>
> Sure, but I don't know what you want to change... :-)
I'm not yet sure, either. I don't think that a too invasive version
would be adequate though converting this mess to the cbio method would
be nice. Or would you prefer to see that? I don't think it's worth
starting on that since you said you'de like to see this part of the IDE
layer die in 2.7 anyway. I would really like to see ide_map_buffer die
in favor of rq_map_buffer though. Hmm.
Perhaps I can think of something else. It's really tricky...
> > > Please look at process_that_request_first() and its usage in TASKFILE
> > > code.
> >
> > I'll do. I already noticed that it used the other fields and obviously
> > doesn't use bi_idx the same way.
> >
> > > You are then required to do partial bio completion.
> >
> > Yes.
>
> Actually no, my mistake... s/required/allowed/
> IDE taskfile code doesn't use partial completions.
Not partial completions of bios but partial completion of requests,
right?
Things like
> while (rq->bio != rq->cbio)
> if (!DRIVER(drive)->end_request(drive, 1, bio_sectors(rq->bio)))
> return ide_stopped;
in the interrupt handlers if you know they suceedeed.
Partial bio completions would probably also be possible, I see, but I
don't need to or want to change that.
Okay.
I'm trying to figure something out to avoid my original ++bio->bi_idx
problem.
Jens,
I'm sorry was there any mention of "binary" or "GPL", no I do not think
so. Was this a thread on licenses? Those have been the subjects I have
enjoyed baiting folks.
This was all about a concern about altering the buffers in the middle of
the FSM, and I owe both Christophe and Bartlomiej each an apology for
not removing them from when you tossed out the live bait of the
paranoid-o-meter. Had I managed to restrain myself as I have done in the
recent past (however freely admit failing many times earlier), replying to
your need to "One-Up-Manship" game was foolish.
I am not here to have the last word, for that I reserve that right for you
to smash and bash to your little heart's content. There is a time and a
place to dispose of my issues with you over the past 5 years, but here and
now is not it. With any luck I will forget about them.
Take care 'Master' Axboe, you have come into yourself.
Regards,
Andre Hedrick
LAD Storage Consulting Group
PS I know you can do it, please take the last word, just one more time,
please.
On Sun, 4 Jan 2004, Jens Axboe wrote:
> On Sat, Jan 03 2004, Andre Hedrick wrote:
> >
> > Jens,
> >
> > Would that need bleach, detergent, or softener?
> > You should try the mirror as my original concerns were addressed to
>
> Your 'original concerns' were not original, they are the same crap you
> spout any chance you get. I don't even remember when I last saw an email
> from you with actual content. Maybe I should just get it over and done
> with and finally black list your email.
>
> > Christophe, and there are to many letters in his name for even you to
> > confuse spellings. :-)
>
> Midly amusing, you talking about spelling.
>
> I dunno why I keep getting suckered into this, it's obvious you're
> looking for these fights. I'll stop taking the bait now.
>
> --
> Jens Axboe
>
Hi Bartlomiej,
I've been playing with the code a bit.
I simple (but not really cleaner) solution to my original problem that
is minimal invasive is to use an unused field from struct request, in
this case nr_cbio_segments. It actually really counts the remaining
segments, only in rq->bio instead of rq->cbio. (The bio_kmap_irq
in ide_map_buffer prevents us from using anything else than rq->bio to
walk the request).
This segment count is then used to correctly restore the bi_idx fields
before ending requests instead of assuming they were zero.
(Well, I just also see that you could probably drop the scratch buffer
and just copy rq->cbio to rq->bio before ending the request... brrr,
no, that's just too ugly...)
BTW, what was ide_multwrite expected to return? These if clauses in
multwrite_intr are never executed.
Please don't shoot me for this proposal.
--- linux.orig/drivers/ide/ide-disk.c 2004-01-04 23:29:01.000000000 +0100
+++ linux/drivers/ide/ide-disk.c 2004-01-04 23:32:32.000000000 +0100
@@ -279,7 +279,7 @@
* all bvecs in this one.
*/
if (++bio->bi_idx >= bio->bi_vcnt) {
- bio->bi_idx = 0;
+ bio->bi_idx = bio->bi_vcnt - bio->nr_cbio_segments;
bio = bio->bi_next;
}
@@ -288,7 +288,8 @@
mcount = 0;
} else {
rq->bio = bio;
- rq->current_nr_sectors = bio_iovec(bio)->bv_len >> 9;
+ rq->nr_cbio_segments = bio_segments(bio);
+ rq->current_nr_sectors = bio_cur_sectors(bio);
rq->hard_cur_sectors = rq->current_nr_sectors;
}
}
@@ -312,6 +313,7 @@
ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
struct request *rq = &hwgroup->wrq;
+ struct bio *bio = rq->bio;
u8 stat;
stat = hwif->INB(IDE_STATUS_REG);
@@ -322,8 +324,10 @@
* of the request
*/
if (rq->nr_sectors) {
- if (ide_multwrite(drive, drive->mult_count))
+ if (ide_multwrite(drive, drive->mult_count)) {
+ bio->bi_idx = bio->bi_vcnt - bio->nr_cbio_segments;
return ide_stopped;
+ }
ide_set_handler(drive, &multwrite_intr, WAIT_CMD, NULL);
return ide_started;
}
@@ -333,6 +337,7 @@
* we can end the original request.
*/
if (!rq->nr_sectors) { /* all done? */
+ bio->bi_idx = bio->bi_vcnt - bio->nr_cbio_segments;
rq = hwgroup->rq;
ide_end_request(drive, 1, rq->nr_sectors);
return ide_stopped;
Hi again,
here another experiment.
I have started moving the code over to using the cbio mechanism. I
have only touched ide-disk.c though, I'm not sure about ide-taskfile.c.
So basically I've replaced bio_kmap_irq with rq_map_buffer in
ide_map_buffer and changed the manual rq->current_nr_sectors/nr_sectors
fiddling with process_that_request_first. I can then recognize whether
a bio border has been crossed if rq->cbio differs from rq->bio since
the rq->current_nr_sectors already might refer to the next bio and won't
drop to zero.
The modifications work here with and without multmode and with all kinds
of bios. Haven't been able to test error conditions since I don't have
broken hardware. ;-)
I also didn't touch ide-taskfile.c which has most probably also been
broken by the ide_map_buffer change. And I stumbled across the code
calling end_request with a null sector count, ide_end_request will then
take hard_nr_sectors which will end the whole request even if only one
bio was finished, huh? Am I missing something here?
And when is bio == NULL in ide_map_buffer? Where can this happen?
bio to cbio
--- linux.orig/drivers/ide/ide-disk.c 2004-01-04 23:43:59.000000000 +0100
+++ linux/drivers/ide/ide-disk.c 2004-01-05 04:17:25.522691784 +0100
@@ -172,11 +172,11 @@
(unsigned long) rq->buffer+(nsect<<9), rq->nr_sectors-nsect);
#endif
ide_unmap_buffer(rq, to, &flags);
- rq->sector += nsect;
+ process_that_request_first(rq, nsect);
rq->errors = 0;
- i = (rq->nr_sectors -= nsect);
- if (((long)(rq->current_nr_sectors -= nsect)) <= 0)
- ide_end_request(drive, 1, rq->hard_cur_sectors);
+ i = rq->nr_sectors;
+ if (rq->bio != rq->cbio)
+ ide_end_request(drive, 1, bio_sectors(rq->bio));
/*
* Another BH Page walker and DATA INTEGRITY Questioned on ERROR.
* If passed back up on multimode read, BAD DATA could be ACKED
@@ -195,7 +195,7 @@
* write_intr() is the handler for disk write interrupts
*/
static ide_startstop_t write_intr (ide_drive_t *drive)
-{
+{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
struct request *rq = hwgroup->rq;
@@ -213,12 +213,11 @@
rq->nr_sectors-1);
#endif
if ((rq->nr_sectors == 1) ^ ((stat & DRQ_STAT) != 0)) {
- rq->sector++;
+ process_that_request_first(rq, 1);
rq->errors = 0;
- i = --rq->nr_sectors;
- --rq->current_nr_sectors;
- if (((long)rq->current_nr_sectors) <= 0)
- ide_end_request(drive, 1, rq->hard_cur_sectors);
+ i = rq->nr_sectors;
+ if (rq->bio != rq->cbio)
+ ide_end_request(drive, 1, bio_sectors(rq->bio));
if (i > 0) {
unsigned long flags;
char *to = ide_map_buffer(rq, &flags);
@@ -245,53 +244,25 @@
* and IRQ context. The IRQ can happen any time after we've output the
* full "mcount" number of sectors, so we must make sure we update the
* state _before_ we output the final part of the data!
- *
- * The update and return to BH is a BLOCK Layer Fakey to get more data
- * to satisfy the hardware atomic segment. If the hardware atomic segment
- * is shorter or smaller than the BH segment then we should be OKAY.
- * This is only valid if we can rewind the rq->current_nr_sectors counter.
*/
int ide_multwrite (ide_drive_t *drive, unsigned int mcount)
{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
- struct request *rq = &hwgroup->wrq;
+ struct request *rq = hwgroup->rq;
do {
char *buffer;
- int nsect = rq->current_nr_sectors;
unsigned long flags;
+ int nsect = rq->current_nr_sectors;
if (nsect > mcount)
nsect = mcount;
mcount -= nsect;
buffer = ide_map_buffer(rq, &flags);
- rq->sector += nsect;
- rq->nr_sectors -= nsect;
- rq->current_nr_sectors -= nsect;
-
- /* Do we move to the next bh after this? */
- if (!rq->current_nr_sectors) {
- struct bio *bio = rq->bio;
-
- /*
- * only move to next bio, when we have processed
- * all bvecs in this one.
- */
- if (++bio->bi_idx >= bio->bi_vcnt) {
- bio->bi_idx = 0;
- bio = bio->bi_next;
- }
-
- /* end early early we ran out of requests */
- if (!bio) {
- mcount = 0;
- } else {
- rq->bio = bio;
- rq->current_nr_sectors = bio_iovec(bio)->bv_len >> 9;
- rq->hard_cur_sectors = rq->current_nr_sectors;
- }
- }
+ process_that_request_first(rq, nsect);
+ if (!rq->cbio)
+ mcount = 0;
/*
* Ok, we're all setup for the interrupt
@@ -311,7 +282,7 @@
{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
- struct request *rq = &hwgroup->wrq;
+ struct request *rq = hwgroup->rq;
u8 stat;
stat = hwif->INB(IDE_STATUS_REG);
@@ -333,12 +304,11 @@
* we can end the original request.
*/
if (!rq->nr_sectors) { /* all done? */
- rq = hwgroup->rq;
- ide_end_request(drive, 1, rq->nr_sectors);
+ ide_end_request(drive, 1, rq->hard_nr_sectors);
return ide_stopped;
}
}
- /* the original code did this here (?) */
+ /* the original code did this here (?) */
return ide_stopped;
}
return DRIVER(drive)->error(drive, "multwrite_intr", stat);
@@ -517,7 +487,9 @@
*
* MAJOR DATA INTEGRITY BUG !!! only if we error
*/
+#if 0
hwgroup->wrq = *rq; /* scratchpad */
+#endif
ide_set_handler(drive, &multwrite_intr, WAIT_CMD, NULL);
if (ide_multwrite(drive, drive->mult_count)) {
unsigned long flags;
--- linux.orig/include/linux/ide.h 2004-01-04 23:43:59.000000000 +0100
+++ linux/include/linux/ide.h 2004-01-05 04:17:38.287751200 +0100
@@ -835,7 +835,7 @@
* fs request
*/
if (rq->bio)
- return bio_kmap_irq(rq->bio, flags) + ide_rq_offset(rq);
+ return rq_map_buffer(rq, flags);
/*
* task request
@@ -846,7 +846,7 @@
static inline void ide_unmap_buffer(struct request *rq, char *buffer, unsigned long *flags)
{
if (rq->bio)
- bio_kunmap_irq(buffer, flags);
+ rq_unmap_buffer(buffer, flags);
}
#endif /* !CONFIG_IDE_TASKFILE_IO */
@@ -1057,8 +1057,10 @@
struct request *rq;
/* failsafe timer */
struct timer_list timer;
+#if 0
/* local copy of current write rq */
struct request wrq;
+#endif
/* timeout value during long polls */
unsigned long poll_timeout;
/* queried upon timeouts */
On Sun, Jan 04 2004, Andre Hedrick wrote:
>
> Jens,
>
> I'm sorry was there any mention of "binary" or "GPL", no I do not think
> so. Was this a thread on licenses? Those have been the subjects I have
> enjoyed baiting folks.
>
> This was all about a concern about altering the buffers in the middle of
> the FSM, and I owe both Christophe and Bartlomiej each an apology for
> not removing them from when you tossed out the live bait of the
> paranoid-o-meter. Had I managed to restrain myself as I have done in the
> recent past (however freely admit failing many times earlier), replying to
> your need to "One-Up-Manship" game was foolish.
>
> I am not here to have the last word, for that I reserve that right for you
> to smash and bash to your little heart's content. There is a time and a
> place to dispose of my issues with you over the past 5 years, but here and
> now is not it. With any luck I will forget about them.
You are so full of it it's amazing.
--
Jens Axboe
On Sunday 04 of January 2004 18:30, Christophe Saout wrote:
> Am Sa, den 03.01.2004 schrieb Bartlomiej Zolnierkiewicz um 23:02:
> > > The way I would prefer is that when someone calls bio_endio the bi_idx
> > > and bv_offset just point where the processed data begins.
> >
> > Are you aware that this will make partial completions illegal?
> > [ No problem for me. ]
>
> Why that? __end_that_request_first already does this (when moving thw
> two lines updating bv_offset/bv_len after the call of the bi_end_io
> function).
Looking once again, I see it is OK.
> > > Can't another (some local) variable be used as bvec index instead of
> > > bi_idx in the original bio? (except from ide_map_buffer using exactly
> > > this index...)
> >
> > see rq_map_buffer() in include/linux/blkdev.h
>
> Right. I've been going through ide-taskfile.c for the last hours.
>
> The IDE_TASKFILE_IO gets things right (from my point of view) and is
> also much cleaner. (I would personally vote for dropping the non
> TASKFILE_IO code, it would make my problem go away :D - why is it still
> marked as experimental BTW? I've been using it since it was introduced,
> without any problems)
There are still some issues to be resolved:
- hangs during reading /proc/ide/<cdrom>/identify on some drives
(workaround is now known thanks to debugging done by Andi+BenH+Andre)
- unexplained fs corruption on x86-64 with AMD IDE chipsets
(the real showstopper)
- somebody needs to test taskfile code on old Promise PDC4030 controller
(low priority)
> BTW: The taskfile code that is used when IDE_TASKFILE_IO is disabled
> might partially end requests without knowing the actual status, right?
Right.
> So non TASKFILE_IO code has two multout codepaths (taskfile and not)
> that are both "awkward" while TASKFILE_IO merges both into a single and
> clean version.
Yes.
> > > Would you be interested in a small patch (well, if I can come up with
> > > one)?
> >
> > Sure, but I don't know what you want to change... :-)
>
> I'm not yet sure, either. I don't think that a too invasive version
> would be adequate though converting this mess to the cbio method would
> be nice. Or would you prefer to see that? I don't think it's worth
> starting on that since you said you'de like to see this part of the IDE
> layer die in 2.7 anyway. I would really like to see ide_map_buffer die
> in favor of rq_map_buffer though. Hmm.
> Perhaps I can think of something else. It's really tricky...
I would like to remove non CONFIG_IDE_TASKFILE_IO paths in 2.6.x
(after issues are resolved) instead of trying to fix them.
--bart
On Monday 05 of January 2004 05:03, Christophe Saout wrote:
> Hi again,
>
> here another experiment.
>
> I have started moving the code over to using the cbio mechanism. I
> have only touched ide-disk.c though, I'm not sure about ide-taskfile.c.
Patch looks nice but I wonder if it is worth doing
(this code should die ASAP).
> The modifications work here with and without multmode and with all kinds
> of bios. Haven't been able to test error conditions since I don't have
> broken hardware. ;-)
Hehe... you can try to break it ;-).
> I also didn't touch ide-taskfile.c which has most probably also been
> broken by the ide_map_buffer change. And I stumbled across the code
Yep.
> calling end_request with a null sector count, ide_end_request will then
> take hard_nr_sectors which will end the whole request even if only one
> bio was finished, huh? Am I missing something here?
No, it is used mainly to fail requests.
This hack should be later removed with care
(there is some strange comment about locking).
> And when is bio == NULL in ide_map_buffer? Where can this happen?
In taskfile code - special requests are not bio baked (taskfile ioctl).
--bart
Am Mo, den 05.01.2004 schrieb Bartlomiej Zolnierkiewicz um 17:12:
> > The IDE_TASKFILE_IO gets things right (from my point of view) and is
> > also much cleaner. (I would personally vote for dropping the non
> > TASKFILE_IO code, it would make my problem go away :D - why is it still
> > marked as experimental BTW? I've been using it since it was introduced,
> > without any problems)
>
> There are still some issues to be resolved:
> - hangs during reading /proc/ide/<cdrom>/identify on some drives
> (workaround is now known thanks to debugging done by Andi+BenH+Andre)
> - unexplained fs corruption on x86-64 with AMD IDE chipsets
> (the real showstopper)
> - somebody needs to test taskfile code on old Promise PDC4030 controller
> (low priority)
Unexplained corruptions. Coder's nightmare. ;) Yes, that's always really
bad. Unfortunately I don't have such a machine so I can't help trying to
nail it down.
> > Perhaps I can think of something else. It's really tricky...
>
> I would like to remove non CONFIG_IDE_TASKFILE_IO paths in 2.6.x
> (after issues are resolved) instead of trying to fix them.
Sure, we agree on that point. I think everyone does.
BTW, I've found a not too complicated workaround for my particular
original problem so the bi_idx issue isn't a showstopper for my
device-mapper target. But apart from that the rewinding bi_idx to zero
thing still gives me headaches just being there. A small non-invasive
workaround won't make it much worse, it's hopefully going to die anyway.
Thanks for being patient with me. :-)
(and not raising new paranoia theories ;-))
On Mon, Jan 05 2004, Bartlomiej Zolnierkiewicz wrote:
> > calling end_request with a null sector count, ide_end_request will then
> > take hard_nr_sectors which will end the whole request even if only one
> > bio was finished, huh? Am I missing something here?
>
> No, it is used mainly to fail requests.
>
> This hack should be later removed with care
> (there is some strange comment about locking).
IIRC, it's due to it not always being safe to inspect rq state outside
of ide_lock. So that makes 0 a magic value that just means 'end the
first chunk' for ide_end_request().
--
Jens Axboe
On Monday 05 of January 2004 17:49, Jens Axboe wrote:
> On Mon, Jan 05 2004, Bartlomiej Zolnierkiewicz wrote:
> > > calling end_request with a null sector count, ide_end_request will then
> > > take hard_nr_sectors which will end the whole request even if only one
> > > bio was finished, huh? Am I missing something here?
> >
> > No, it is used mainly to fail requests.
> >
> > This hack should be later removed with care
> > (there is some strange comment about locking).
>
> IIRC, it's due to it not always being safe to inspect rq state outside
> of ide_lock. So that makes 0 a magic value that just means 'end the
> first chunk' for ide_end_request().
Why/when it is not safe to do?
On Monday 05 of January 2004 04:52, Christophe Saout wrote:
> BTW, what was ide_multwrite expected to return? These if clauses in
> multwrite_intr are never executed.
Dunno. It can't fail so it should be made void.
Please also add bio->bi_idx restoring for failed requests.
Put it before DRIVER(drive)->error() (and remember about if (bio) check).
Otherwise I patch is OK for me.
--bart
On Mon, Jan 05 2004, Bartlomiej Zolnierkiewicz wrote:
> On Monday 05 of January 2004 17:49, Jens Axboe wrote:
> > On Mon, Jan 05 2004, Bartlomiej Zolnierkiewicz wrote:
> > > > calling end_request with a null sector count, ide_end_request will then
> > > > take hard_nr_sectors which will end the whole request even if only one
> > > > bio was finished, huh? Am I missing something here?
> > >
> > > No, it is used mainly to fail requests.
> > >
> > > This hack should be later removed with care
> > > (there is some strange comment about locking).
> >
> > IIRC, it's due to it not always being safe to inspect rq state outside
> > of ide_lock. So that makes 0 a magic value that just means 'end the
> > first chunk' for ide_end_request().
>
> Why/when it is not safe to do?
You would need to read hwgroup->rq.
--
Jens Axboe
On Monday 05 of January 2004 19:16, Jens Axboe wrote:
> On Mon, Jan 05 2004, Bartlomiej Zolnierkiewicz wrote:
> > On Monday 05 of January 2004 17:49, Jens Axboe wrote:
> > > On Mon, Jan 05 2004, Bartlomiej Zolnierkiewicz wrote:
> > > > > calling end_request with a null sector count, ide_end_request will
> > > > > then take hard_nr_sectors which will end the whole request even if
> > > > > only one bio was finished, huh? Am I missing something here?
> > > >
> > > > No, it is used mainly to fail requests.
> > > >
> > > > This hack should be later removed with care
> > > > (there is some strange comment about locking).
> > >
> > > IIRC, it's due to it not always being safe to inspect rq state outside
> > > of ide_lock. So that makes 0 a magic value that just means 'end the
> > > first chunk' for ide_end_request().
> >
> > Why/when it is not safe to do?
>
> You would need to read hwgroup->rq.
Okay I see it, non IRQ context.
On Mon, 2004-01-05 at 17:12, Bartlomiej Zolnierkiewicz wrote:
> - hangs during reading /proc/ide/<cdrom>/identify on some drives
> (workaround is now known thanks to debugging done by Andi+BenH+Andre)
could you explain about this workaround? i've searched the archives
without finding anything.
--
fred
On Monday 05 of January 2004 20:37, Frederik Deweerdt wrote:
> On Mon, 2004-01-05 at 17:12, Bartlomiej Zolnierkiewicz wrote:
> > - hangs during reading /proc/ide/<cdrom>/identify on some drives
> > (workaround is now known thanks to debugging done by Andi+BenH+Andre)
>
> could you explain about this workaround? i've searched the archives
> without finding anything.
Because it was discovered recently and it is not the proper fix. :-)
drivers/ide/ide-taskfile.c | 9 ++++++---
1 files changed, 6 insertions(+), 3 deletions(-)
diff -puN drivers/ide/ide-taskfile.c~ide-tf-identify-fix drivers/ide/ide-taskfile.c
--- linux-2.6.1-rc1/drivers/ide/ide-taskfile.c~ide-tf-identify-fix 2004-01-04 16:35:53.094766576 +0100
+++ linux-2.6.1-rc1-root/drivers/ide/ide-taskfile.c 2004-01-04 16:45:03.240131768 +0100
@@ -797,9 +797,12 @@ check_status:
if (!OK_STAT(stat, good_stat, BAD_R_STAT)) {
if (stat & (ERR_STAT | DRQ_STAT))
return DRIVER(drive)->error(drive, __FUNCTION__, stat);
- /* BUSY_STAT: No data yet, so wait for another IRQ. */
- ide_set_handler(drive, &task_in_intr, WAIT_WORSTCASE, NULL);
- return ide_started;
+ /* Workaround for some ATAPI drives which set only BSY bit. */
+ if (drive->media != ide_cdrom) {
+ /* BUSY_STAT: No data yet, so wait for another IRQ. */
+ ide_set_handler(drive, &task_in_intr, WAIT_WORSTCASE, NULL);
+ return ide_started;
+ }
}
/*
_
On Mon, Jan 05, 2004 at 06:08:49PM +0100, Bartlomiej Zolnierkiewicz wrote:
> On Monday 05 of January 2004 04:52, Christophe Saout wrote:
>
> > BTW, what was ide_multwrite expected to return? These if clauses in
> > multwrite_intr are never executed.
>
> Dunno. It can't fail so it should be made void.
Ok, done.
> Please also add bio->bi_idx restoring for failed requests.
> Put it before DRIVER(drive)->error()
Whoops, overlooked this one. Right.
> (and remember about if (bio) check).
Remember? Can bio be NULL somewhere? Or what do you mean? It's our
scratchpad and ide_multwrite never puts a NULL bio on it.
> Otherwise I patch is OK for me.
Ok, take two.
I also did legacy/pdc4030.c, it's more or less the same though I'm not
able to test it.
--- linux.orig/drivers/ide/ide-disk.c 2004-01-04 23:29:01.000000000 +0100
+++ linux/drivers/ide/ide-disk.c 2004-01-05 23:35:35.258199832 +0100
@@ -251,7 +251,7 @@
* is shorter or smaller than the BH segment then we should be OKAY.
* This is only valid if we can rewind the rq->current_nr_sectors counter.
*/
-int ide_multwrite (ide_drive_t *drive, unsigned int mcount)
+void ide_multwrite (ide_drive_t *drive, unsigned int mcount)
{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
struct request *rq = &hwgroup->wrq;
@@ -279,7 +279,7 @@
* all bvecs in this one.
*/
if (++bio->bi_idx >= bio->bi_vcnt) {
- bio->bi_idx = 0;
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
bio = bio->bi_next;
}
@@ -288,7 +288,8 @@
mcount = 0;
} else {
rq->bio = bio;
- rq->current_nr_sectors = bio_iovec(bio)->bv_len >> 9;
+ rq->nr_cbio_segments = bio_segments(bio);
+ rq->current_nr_sectors = bio_cur_sectors(bio);
rq->hard_cur_sectors = rq->current_nr_sectors;
}
}
@@ -300,8 +301,6 @@
taskfile_output_data(drive, buffer, nsect<<7);
ide_unmap_buffer(rq, buffer, &flags);
} while (mcount);
-
- return 0;
}
/*
@@ -312,6 +311,7 @@
ide_hwgroup_t *hwgroup = HWGROUP(drive);
ide_hwif_t *hwif = HWIF(drive);
struct request *rq = &hwgroup->wrq;
+ struct bio *bio = rq->bio;
u8 stat;
stat = hwif->INB(IDE_STATUS_REG);
@@ -322,8 +322,7 @@
* of the request
*/
if (rq->nr_sectors) {
- if (ide_multwrite(drive, drive->mult_count))
- return ide_stopped;
+ ide_multwrite(drive, drive->mult_count);
ide_set_handler(drive, &multwrite_intr, WAIT_CMD, NULL);
return ide_started;
}
@@ -333,14 +332,17 @@
* we can end the original request.
*/
if (!rq->nr_sectors) { /* all done? */
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
rq = hwgroup->rq;
ide_end_request(drive, 1, rq->nr_sectors);
return ide_stopped;
}
}
/* the original code did this here (?) */
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
return ide_stopped;
}
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
return DRIVER(drive)->error(drive, "multwrite_intr", stat);
}
@@ -519,14 +521,7 @@
*/
hwgroup->wrq = *rq; /* scratchpad */
ide_set_handler(drive, &multwrite_intr, WAIT_CMD, NULL);
- if (ide_multwrite(drive, drive->mult_count)) {
- unsigned long flags;
- spin_lock_irqsave(&ide_lock, flags);
- hwgroup->handler = NULL;
- del_timer(&hwgroup->timer);
- spin_unlock_irqrestore(&ide_lock, flags);
- return ide_stopped;
- }
+ ide_multwrite(drive, drive->mult_count);
} else {
unsigned long flags;
char *to = ide_map_buffer(rq, &flags);
--- linux.orig/drivers/ide/legacy/pdc4030.c 2004-01-05 20:34:29.000000000 +0100
+++ linux/drivers/ide/legacy/pdc4030.c 2004-01-05 23:34:41.895312224 +0100
@@ -443,7 +443,12 @@
static ide_startstop_t promise_complete_pollfunc(ide_drive_t *drive)
{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
+#ifdef CONFIG_IDE_TASKFILE_IO
struct request *rq = hwgroup->rq;
+#else
+ struct request *rq = &hwgroup->wrq;
+ struct bio *bio = rq->bio;
+#endif
if ((HWIF(drive)->INB(IDE_STATUS_REG)) & BUSY_STAT) {
if (time_before(jiffies, hwgroup->poll_timeout)) {
@@ -472,6 +477,8 @@
while (rq->bio != rq->cbio)
(void) DRIVER(drive)->end_request(drive, 1, bio_sectors(rq->bio));
#else
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
+ rq = hwgroup->rq;
DRIVER(drive)->end_request(drive, 1, rq->hard_nr_sectors);
#endif
return ide_stopped;
@@ -530,7 +537,7 @@
* all bvecs in this one.
*/
if (++bio->bi_idx >= bio->bi_vcnt) {
- bio->bi_idx = 0;
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
bio = bio->bi_next;
}
@@ -539,7 +546,8 @@
mcount = 0;
} else {
rq->bio = bio;
- rq->current_nr_sectors = bio_iovec(bio)->bv_len >> 9;
+ rq->nr_cbio_segments = bio_segments(bio);
+ rq->current_nr_sectors = bio_cur_sectors(bio);
rq->hard_cur_sectors = rq->current_nr_sectors;
}
}
@@ -561,6 +569,9 @@
ide_hwgroup_t *hwgroup = HWGROUP(drive);
#ifdef CONFIG_IDE_TASKFILE_IO
struct request *rq = hwgroup->rq;
+#else
+ struct request *rq = &hwgroup->wrq;
+ struct bio *bio = rq->bio;
#endif
if (HWIF(drive)->INB(IDE_NSECTOR_REG) != 0) {
@@ -575,6 +586,9 @@
}
hwgroup->poll_timeout = 0;
printk(KERN_ERR "%s: write timed-out!\n",drive->name);
+#ifndef CONFIG_IDE_TASKFILE_IO
+ bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
+#endif
return DRIVER(drive)->error(drive, "write timeout",
HWIF(drive)->INB(IDE_STATUS_REG));
}
On Monday 05 of January 2004 23:51, Christophe Saout wrote:
> Remember? Can bio be NULL somewhere? Or what do you mean? It's our
> scratchpad and ide_multwrite never puts a NULL bio on it.
After last sector of the whole transfer is processed ide_multwrite() will set
it to NULL. Next IRQ is only ACK of previous datablock, no transfer happens.
> > Otherwise I patch is OK for me.
>
> Ok, take two.
>
> I also did legacy/pdc4030.c, it's more or less the same though I'm not
> able to test it.
Looks OK.
> @@ -333,14 +332,17 @@
> * we can end the original request.
> */
> if (!rq->nr_sectors) { /* all done? */
> + bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
> rq = hwgroup->rq;
> ide_end_request(drive, 1, rq->nr_sectors);
> return ide_stopped;
> }
> }
> /* the original code did this here (?) */
> + bio->bi_idx = bio->bi_vcnt - rq->nr_cbio_segments;
> return ide_stopped;
Move it before the comment.
--bart
On Tue, Jan 06, 2004 at 12:59:52AM +0100, Bartlomiej Zolnierkiewicz wrote:
> On Monday 05 of January 2004 23:51, Christophe Saout wrote:
> > Remember? Can bio be NULL somewhere? Or what do you mean? It's our
> > scratchpad and ide_multwrite never puts a NULL bio on it.
>
> After last sector of the whole transfer is processed ide_multwrite() will set
> it to NULL.
No, it doesn't.
> /* end early early we ran out of requests */
> if (!bio) {
> mcount = 0;
> } else {
> rq->bio = bio;
> rq->nr_cbio_segments = bio_segments(bio);
> rq->current_nr_sectors = bio_cur_sectors(bio);
> rq->hard_cur_sectors = rq->current_nr_sectors;
> }
rq->bio is only set if bio is not NULL.
> Next IRQ is only ACK of previous datablock, no transfer happens.
You're right, the bi_idx resetting might be redundant but since bio is
never NULL an additional check is superfluous.
> Move it before the comment.
Ok. I will repost when the issue above is worked out.
On Tuesday 06 of January 2004 12:33, Christophe Saout wrote:
> On Tue, Jan 06, 2004 at 12:59:52AM +0100, Bartlomiej Zolnierkiewicz wrote:
> > On Monday 05 of January 2004 23:51, Christophe Saout wrote:
> > > Remember? Can bio be NULL somewhere? Or what do you mean? It's our
> > > scratchpad and ide_multwrite never puts a NULL bio on it.
> >
> > After last sector of the whole transfer is processed ide_multwrite() will
> > set it to NULL.
>
> No, it doesn't.
Yep, you are right, bio is NULL, but rq->bio is not set...
--bart
Am Di, den 06.01.2004 schrieb Bartlomiej Zolnierkiewicz um 15:38:
> > > > Remember? Can bio be NULL somewhere? Or what do you mean? It's our
> > > > scratchpad and ide_multwrite never puts a NULL bio on it.
> > >
> > > After last sector of the whole transfer is processed ide_multwrite() will
> > > set it to NULL.
> >
> > No, it doesn't.
>
> Yep, you are right, bio is NULL, but rq->bio is not set...
So, shall I resend the updated patch with the one line moved above the
comment?