2006-01-17 19:41:59

by Chen, Kenneth W

[permalink] [raw]
Subject: [patch] bug fix in dio handling write error

There is a bug in direct-io on propagating write error up to the
higher I/O layer. When performing an async ODIRECT write to a
block device, if a device error occurred (like media error or disk
is pulled), the error code is only propagated from device driver
to the DIO layer. The error code stops at finished_one_bio(). The
aysnc write, however, is supposedly have a corresponding AIO event
with appropriate return code (in this case -EIO). Application
which waits on the async write event, will hang forever since such
AIO event is lost forever (if such app did not use the timeout
option in io_getevents call. Regardless, an AIO event is lost).

The problem is that calls to aio_complete() is conditioned upon
READ, but it should've handle WRITE error as well.


Signed-off-by: Ken Chen <[email protected]>


--- linux-2.6.15/fs/direct-io.c.orig 2006-01-17 11:54:17.010422462 -0800
+++ linux-2.6.15/fs/direct-io.c 2006-01-17 12:08:00.444982688 -0800
@@ -253,8 +253,7 @@ static void finished_one_bio(struct dio
dio_complete(dio, offset, transferred);

/* Complete AIO later if falling back to buffered i/o */
- if (dio->result == dio->size ||
- ((dio->rw == READ) && dio->result)) {
+ if (dio->result == dio->size || dio->result) {
aio_complete(dio->iocb, transferred, 0);
kfree(dio);
return;




2006-01-17 23:27:15

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [patch] bug fix in dio handling write error

On Tue, 2006-01-17 at 11:41 -0800, Chen, Kenneth W wrote:
> There is a bug in direct-io on propagating write error up to the
> higher I/O layer. When performing an async ODIRECT write to a
> block device, if a device error occurred (like media error or disk
> is pulled), the error code is only propagated from device driver
> to the DIO layer. The error code stops at finished_one_bio(). The
> aysnc write, however, is supposedly have a corresponding AIO event
> with appropriate return code (in this case -EIO). Application
> which waits on the async write event, will hang forever since such
> AIO event is lost forever (if such app did not use the timeout
> option in io_getevents call. Regardless, an AIO event is lost).
>
> The problem is that calls to aio_complete() is conditioned upon
> READ, but it should've handle WRITE error as well.
>
>
> Signed-off-by: Ken Chen <[email protected]>
>
>
> --- linux-2.6.15/fs/direct-io.c.orig 2006-01-17 11:54:17.010422462 -0800
> +++ linux-2.6.15/fs/direct-io.c 2006-01-17 12:08:00.444982688 -0800
> @@ -253,8 +253,7 @@ static void finished_one_bio(struct dio
> dio_complete(dio, offset, transferred);
>
> /* Complete AIO later if falling back to buffered i/o */
> - if (dio->result == dio->size ||
> - ((dio->rw == READ) && dio->result)) {
> + if (dio->result == dio->size || dio->result) {
> aio_complete(dio->iocb, transferred, 0);
> kfree(dio);
> return;
>
>

I vaguely remember adding the explicit "dio->rw == READ" check for a
reason (which escapes me right now). Suparna, do you remember ? Let me
think and get back to you.

Thanks,
Badari