2004-09-03 15:54:50

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application

On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
> Begin forwarded message:
>
> Date: Tue, 31 Aug 2004 06:15:18 -0700
> From: [email protected]
> To: [email protected]
> Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
>
>
> http://bugme.osdl.org/show_bug.cgi?id=3317
>

Hi Andrew,

I debugged this some more. Here is whats happening:

The test program used program text address as buffer to do the READ to.
DIO get_user_pages() returned EFAULT. We called finished_one_bio()
as part of dropping the ref. to dio. It called aio_complete().
do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
"ret" value. This is where the second aio_complete() is coming from.
So we cleanup "req" and on the next de-ref we get OOPS.

The problem here is, finished_one_bio() shouldn't call aio_complete()
since no work has been done. I have a fix for this - can you verify this
? I am not really comfortable with this "tweaking". (I am not really
sure about IO errors like EIO etc. - if they can lead to calling
aio_complete() twice)


Fix is to call aio_complete() ONLY if there is something to report.
Note the we don't update dio->result with any error codes from
get_user_pages(), they just passed as "ret" value from do_direct_IO().

Thanks,
Badari








Attachments:
aio-dio.patch (557.00 B)

2004-09-03 22:58:18

by Daniel McNeil

[permalink] [raw]
Subject: Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application

On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:
> On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
> > Begin forwarded message:
> >
> > Date: Tue, 31 Aug 2004 06:15:18 -0700
> > From: [email protected]
> > To: [email protected]
> > Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
> >
> >
> > http://bugme.osdl.org/show_bug.cgi?id=3317
> >
>
> Hi Andrew,
>
> I debugged this some more. Here is whats happening:
>
> The test program used program text address as buffer to do the READ to.
> DIO get_user_pages() returned EFAULT. We called finished_one_bio()
> as part of dropping the ref. to dio. It called aio_complete().
> do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
> to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
> "ret" value. This is where the second aio_complete() is coming from.
> So we cleanup "req" and on the next de-ref we get OOPS.
>
> The problem here is, finished_one_bio() shouldn't call aio_complete()
> since no work has been done. I have a fix for this - can you verify this
> ? I am not really comfortable with this "tweaking". (I am not really
> sure about IO errors like EIO etc. - if they can lead to calling
> aio_complete() twice)
>
>
> Fix is to call aio_complete() ONLY if there is something to report.
> Note the we don't update dio->result with any error codes from
> get_user_pages(), they just passed as "ret" value from do_direct_IO().
>
> Thanks,
> Badari

Badari,

This does fix the problem when running on my system (ext3).

One question, finished_one_bio() is called in 3 places,
are you sure the other places won't be harmed by this
change?

I'm also looking over the code and will let you know if
I see any problems.

Daniel

2004-09-04 17:32:39

by Badari Pulavarty

[permalink] [raw]
Subject: Re: [Bug 3317] New: Kernel oops in aio_complete while running AIO application

Daniel,

aio_complete() gets called only when we are done with this dio.
Other calls to finished_one_bio() should be fine. dio->result
should have the return value we want to send back. The fix
I made is to call aio_complete() only if we have something to
report back.

One problem is, dio->result gets updated for IO errors bur
doesn't get updated for errors from get_user_pages(). Things
should be fine, but I am not really comfortable retruning half
errors thro aio_complete() and other half thro return value
of do_direct_IO(). I guess its okay, since some of the IO errors
can happen only after we submit the bio.

Thanks,
Badari

Daniel McNeil wrote:

>On Fri, 2004-09-03 at 08:52, Badari Pulavarty wrote:
>
>
>>On Tue, 2004-08-31 at 08:18, Andrew Morton wrote:
>>
>>
>>>Begin forwarded message:
>>>
>>>Date: Tue, 31 Aug 2004 06:15:18 -0700
>>>From: [email protected]
>>>To: [email protected]
>>>Subject: [Bugme-new] [Bug 3317] New: Kernel oops in aio_complete while running AIO application
>>>
>>>
>>>http://bugme.osdl.org/show_bug.cgi?id=3317
>>>
>>>
>>>
>>Hi Andrew,
>>
>>I debugged this some more. Here is whats happening:
>>
>>The test program used program text address as buffer to do the READ to.
>>DIO get_user_pages() returned EFAULT. We called finished_one_bio()
>>as part of dropping the ref. to dio. It called aio_complete().
>>do_direct_IO() returned EFAULT to the caller. aio_run_iocb() expects
>>to see EIOCBQUEUED/RETRY, otherwise it calls aio_complete() with the
>>"ret" value. This is where the second aio_complete() is coming from.
>>So we cleanup "req" and on the next de-ref we get OOPS.
>>
>>The problem here is, finished_one_bio() shouldn't call aio_complete()
>>since no work has been done. I have a fix for this - can you verify this
>>? I am not really comfortable with this "tweaking". (I am not really
>>sure about IO errors like EIO etc. - if they can lead to calling
>>aio_complete() twice)
>>
>>
>>Fix is to call aio_complete() ONLY if there is something to report.
>>Note the we don't update dio->result with any error codes from
>>get_user_pages(), they just passed as "ret" value from do_direct_IO().
>>
>>Thanks,
>>Badari
>>
>>
>
>Badari,
>
>This does fix the problem when running on my system (ext3).
>
>One question, finished_one_bio() is called in 3 places,
>are you sure the other places won't be harmed by this
>change?
>
>I'm also looking over the code and will let you know if
>I see any problems.
>
>Daniel
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-aio' in
>the body to [email protected]. For more info on Linux AIO,
>see: http://www.kvack.org/aio/
>Don't email: <a href=mailto:"[email protected]">[email protected]</a>
>
>
>