2006-03-21 08:51:47

by Chen, Kenneth W

[permalink] [raw]
Subject: [patch] direct-io: bug fix in dio handling write error

From: "Chen, Kenneth W" <[email protected]>

There is a bug in direct-io on propagating write error up to the
higher I/O layer. When performing an async ODIRECT write to a
block device, if a device error occurred (like media error or disk
is pulled), the error code is only propagated from device driver
to the DIO layer. The error code stops at finished_one_bio(). The
aysnc write, however, is supposedly have a corresponding AIO event
with appropriate return code (in this case -EIO). Application
which waits on the async write event, will hang forever since such
AIO event is lost forever (if such app did not use the timeout
option in io_getevents call. Regardless, an AIO event is lost).

The discovery of above bug leads to another discovery of potential
race window with dio->result. The fundamental problem is that
dio->result is overloaded with dual use: an indicator of fall back
path for partial dio write, and an error indicator used in the I/O
completion path. In the event of device error, the setting of -EIO
to dio->result clashes with value used to track partial write that
activates the fall back path.

It was also pointed out that it is impossible to use dio->result to
track partial write and at the same time to track error returned
from device driver. Because direct_io_work can only determines whether
it is a partial write at the end of io submission and in mid stream
of those io submission, a return code could be coming back from the
driver. Thus messing up all the subsequent logic.

Proposed fix is to separating out error code returned by the IO
completion path from partial IO submit tracking. A new variable is
added to dio structure specifically to track io error returned in the
completion path.


Signed-off-by: Ken Chen <[email protected]>
Acked-by: Zach Brown <[email protected]>
Acked-by: Suparna Bhattacharya <[email protected]>
Cc: Badari Pulavarty <[email protected]>
---

Patch tested by pulling a scsi drive while running aiostress as well
as running test code in http://developer.osdl.org/daniel/AIO/TESTS/

Andrew - please merge this version. Thank you.


--- ./fs/direct-io.c.orig 2006-01-02 19:21:10.000000000 -0800
+++ ./fs/direct-io.c 2006-03-21 01:28:48.704475280 -0800
@@ -129,6 +129,7 @@ struct dio {
/* AIO related stuff */
struct kiocb *iocb; /* kiocb */
int is_async; /* is IO async ? */
+ int io_error; /* IO error in completion path */
ssize_t result; /* IO result */
};

@@ -250,6 +251,10 @@ static void finished_one_bio(struct dio
((offset + transferred) > dio->i_size))
transferred = dio->i_size - offset;

+ /* check for error in completion path */
+ if (dio->io_error)
+ transferred = dio->io_error;
+
dio_complete(dio, offset, transferred);

/* Complete AIO later if falling back to buffered i/o */
@@ -406,7 +411,7 @@ static int dio_bio_complete(struct dio *
int page_no;

if (!uptodate)
- dio->result = -EIO;
+ dio->io_error = -EIO;

if (dio->is_async && dio->rw == READ) {
bio_check_pages_dirty(bio); /* transfers ownership */
@@ -964,6 +969,7 @@ direct_io_worker(int rw, struct kiocb *i
dio->next_block_for_io = -1;

dio->page_errors = 0;
+ dio->io_error = 0;
dio->result = 0;
dio->iocb = iocb;
dio->i_size = i_size_read(inode);




2006-03-21 16:54:58

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [patch] direct-io: bug fix in dio handling write error

On Tue, 2006-03-21 at 00:51 -0800, Chen, Kenneth W wrote:
....
>
> Andrew - please merge this version. Thank you.
>
>
> --- ./fs/direct-io.c.orig 2006-01-02 19:21:10.000000000 -0800
> +++ ./fs/direct-io.c 2006-03-21 01:28:48.704475280 -0800
> @@ -129,6 +129,7 @@ struct dio {
> /* AIO related stuff */
> struct kiocb *iocb; /* kiocb */
> int is_async; /* is IO async ? */
> + int io_error; /* IO error in completion path */
> ssize_t result; /* IO result */
> };
>
> @@ -250,6 +251,10 @@ static void finished_one_bio(struct dio
> ((offset + transferred) > dio->i_size))
> transferred = dio->i_size - offset;
>
> + /* check for error in completion path */
> + if (dio->io_error)
> + transferred = dio->io_error;
> +
> dio_complete(dio, offset, transferred);
>
> /* Complete AIO later if falling back to buffered i/o */
> @@ -406,7 +411,7 @@ static int dio_bio_complete(struct dio *
> int page_no;
>
> if (!uptodate)
> - dio->result = -EIO;
> + dio->io_error = -EIO;
>
> if (dio->is_async && dio->rw == READ) {
> bio_check_pages_dirty(bio); /* transfers ownership */
> @@ -964,6 +969,7 @@ direct_io_worker(int rw, struct kiocb *i
> dio->next_block_for_io = -1;
>
> dio->page_errors = 0;
> + dio->io_error = 0;
> dio->result = 0;
> dio->iocb = iocb;
> dio->i_size = i_size_read(inode);
>

Ken,

I hate to do this you - but your patch breaks error handling on
synchronous DIO requests.

Since you are using "dio->io_error" instead of "dio->result" to
represent an error - you need to make sure to check that (also ?)
instead of dio->result in direct_io_worker() before calling
dio_complete().

Isn't it ? Am I missing something ?

Thanks,
Badari

2006-03-21 19:03:14

by Chen, Kenneth W

[permalink] [raw]
Subject: RE: [patch] direct-io: bug fix in dio handling write error

Badari Pulavarty wrote on Tuesday, March 21, 2006 8:57 AM
> I hate to do this you - but your patch breaks error handling on
> synchronous DIO requests.
>
> Since you are using "dio->io_error" instead of "dio->result" to
> represent an error - you need to make sure to check that (also ?)
> instead of dio->result in direct_io_worker() before calling
> dio_complete().
>
> Isn't it ? Am I missing something ?


That's the other part of the maze. AFAICS, in the synchronous path,
dio_bio_complete already implicitly checks -EIO error:

static int dio_bio_complete(struct dio *dio, struct bio *bio)
{ ...
return uptodate ? 0 : -EIO;
}

And such error code bubbles up to direct_io_worker's sync path:

direct_io_worker {
...
if (dio->is_async) {
...
} else {
ret2 = dio_await_completion(dio);
if (ret == 0)
ret = ret2;

I've also explicitly ran test case for synchronous write and found no
regression there. I admit my test coverage may not be very comprehensive.
But I've done the best I can.

It's entirely possible there are more corner cases. But let's get some
coverage here with -mm and then add fixes as we go.

- Ken

2006-03-21 20:56:09

by Badari Pulavarty

[permalink] [raw]
Subject: RE: [patch] direct-io: bug fix in dio handling write error

On Tue, 2006-03-21 at 11:03 -0800, Chen, Kenneth W wrote:
> Badari Pulavarty wrote on Tuesday, March 21, 2006 8:57 AM
> > I hate to do this you - but your patch breaks error handling on
> > synchronous DIO requests.
> >
> > Since you are using "dio->io_error" instead of "dio->result" to
> > represent an error - you need to make sure to check that (also ?)
> > instead of dio->result in direct_io_worker() before calling
> > dio_complete().
> >
> > Isn't it ? Am I missing something ?
>
>
> That's the other part of the maze. AFAICS, in the synchronous path,
> dio_bio_complete already implicitly checks -EIO error:
>
> static int dio_bio_complete(struct dio *dio, struct bio *bio)
> { ...
> return uptodate ? 0 : -EIO;
> }
>
> And such error code bubbles up to direct_io_worker's sync path:
>
> direct_io_worker {
> ...
> if (dio->is_async) {
> ...
> } else {
> ret2 = dio_await_completion(dio);
> if (ret == 0)
> ret = ret2;
>
> I've also explicitly ran test case for synchronous write and found no
> regression there. I admit my test coverage may not be very comprehensive.
> But I've done the best I can.
>
> It's entirely possible there are more corner cases. But let's get some
> coverage here with -mm and then add fixes as we go.

I know that is not your fault - but does this mean that we can't
return success for partial IO ?

If some one asks to do IO for 128K and if we get -EIO after say, 64K
- we fail the whole IO with -EIO ?

Thanks,
Badari

2006-03-21 23:10:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] direct-io: bug fix in dio handling write error

Badari Pulavarty <[email protected]> wrote:
>
> If some one asks to do IO for 128K and if we get -EIO after say, 64K
> - we fail the whole IO with -EIO ?

A multiple-block direct-IO can complete those blocks (actually the BIOs) in
any order. How do we communicate to userspace that segments 0, 1, 3 and 5
completed OK, but segments 2 and 4 had errors?

The best we could do is to tell userspace that 0 and 1 completed and ignore
everything after the first I/O error.

But I don't think it's worth the effort. Simply returning EIO for the
whole thing is probably good enough. The only situation I can think of
where someone would care down to that level of detail is a specialised
recover-data-from-a-bad-disk application (for which direct-io would be a
good tool). Such an application should have the brains to go
sector-at-a-time if larger I/O's throw errors.