LinuxLists.cc - [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

2011-01-19 22:36:38

Subject: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

Nick Piggin reports:

> I'm getting use after frees in aio code in NFS
>
> [ 2703.396766] Call Trace:
> [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60
> [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
>
> Adding some tracing, it is due to nfs completing the request then
> returning something other than -EIOCBQUEUED, so aio.c
> also completes the request.

To address this, prevent the NFS direct I/O engine from completing
async iocbs when the forward path returns an error other than
EIOCBQUEUED.

This appears to survive ^C during both "xfstest no. 208" and "fsx -Z."

Cc: Stable <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---

Here's my take.

fs/nfs/direct.c | 32 +++++++++++++++++---------------
1 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index e6ace0d..c2176f4 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -407,15 +407,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
pos += vec->iov_len;
}

- if (put_dreq(dreq))
- nfs_direct_complete(dreq);
-
- if (requested_bytes != 0)
- return 0;
+ /*
+ * If no bytes were started, return the error, and let the
+ * generic layer handle the completion.
+ */
+ if (requested_bytes == 0)
+ return result < 0 ? result : -EIO;

- if (result < 0)
- return result;
- return -EIO;
+ if (put_dreq(dreq))
+ nfs_direct_write_complete(dreq, dreq->inode);
+ return 0;
}

static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
@@ -841,15 +842,16 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
pos += vec->iov_len;
}

+ /*
+ * If no bytes were started, return the error, and let the
+ * generic layer handle the completion.
+ */
+ if (requested_bytes == 0)
+ return result < 0 ? result : -EIO;
+
if (put_dreq(dreq))
nfs_direct_write_complete(dreq, dreq->inode);
-
- if (requested_bytes != 0)
- return 0;
-
- if (result < 0)
- return result;
- return -EIO;
+ return 0;
}

static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,

2011-01-21 02:44:39

by Wengang Wang

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On 11-01-19 17:36, Chuck Lever wrote:
> Nick Piggin reports:
>
> > I'm getting use after frees in aio code in NFS
> >
> > [ 2703.396766] Call Trace:
> > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60
> > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
> >
> > Adding some tracing, it is due to nfs completing the request then
> > returning something other than -EIOCBQUEUED, so aio.c
> > also completes the request.
>
> To address this, prevent the NFS direct I/O engine from completing
> async iocbs when the forward path returns an error other than
> EIOCBQUEUED.
>
> This appears to survive ^C during both "xfstest no. 208" and "fsx -Z."
>
> Cc: Stable <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> Here's my take.
>
> fs/nfs/direct.c | 32 +++++++++++++++++---------------
> 1 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index e6ace0d..c2176f4 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -407,15 +407,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
> pos += vec->iov_len;
> }
>
> - if (put_dreq(dreq))
> - nfs_direct_complete(dreq);
> -
> - if (requested_bytes != 0)
> - return 0;
> + /*
> + * If no bytes were started, return the error, and let the
> + * generic layer handle the completion.
> + */

But the put_dreq() -> nfs_direct_complete() does more than complete the
aio.
It also drops ref on dreq with put_dreq() and does

complete_all(&dreq->completion);
nfs_direct_req_release(dreq);

I think we still needs that called somewhere.

regards,
wengang.

> + if (requested_bytes == 0)
> + return result < 0 ? result : -EIO;
>
> - if (result < 0)
> - return result;
> - return -EIO;
> + if (put_dreq(dreq))
> + nfs_direct_write_complete(dreq, dreq->inode);
> + return 0;
> }
>
> static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
> @@ -841,15 +842,16 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
> pos += vec->iov_len;
> }
>
> + /*
> + * If no bytes were started, return the error, and let the
> + * generic layer handle the completion.
> + */
> + if (requested_bytes == 0)
> + return result < 0 ? result : -EIO;
> +
> if (put_dreq(dreq))
> nfs_direct_write_complete(dreq, dreq->inode);
> -
> - if (requested_bytes != 0)
> - return 0;
> -
> - if (result < 0)
> - return result;
> - return -EIO;
> + return 0;
> }
>
> static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2011-01-19 23:31:50

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
> <[email protected]> wrote:
> > On Wed, 2011-01-19 at 17:36 -0500, Chuck Lever wrote:
> >> Nick Piggin reports:
> >>
> >> > I'm getting use after frees in aio code in NFS
> >> >
> >> > [ 2703.396766] Call Trace:
> >> > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> >> > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> >> > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> >> > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> >> > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> >> > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> >> > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> >> > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> >> > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60
> >> > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> >> > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> >> > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
> >> >
> >> > Adding some tracing, it is due to nfs completing the request then
> >> > returning something other than -EIOCBQUEUED, so aio.c
> >> > also completes the request.
> >>
> >> To address this, prevent the NFS direct I/O engine from completing
> >> async iocbs when the forward path returns an error other than
> >> EIOCBQUEUED.
> >>
> >> This appears to survive ^C during both "xfstest no. 208" and "fsx -Z."
> >>
> >> Cc: Stable <[email protected]>
> >> Signed-off-by: Chuck Lever <[email protected]>
> >> ---
> >>
> >> Here's my take.
> >>
> >> fs/nfs/direct.c | 32 +++++++++++++++++---------------
> >> 1 files changed, 17 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> >> index e6ace0d..c2176f4 100644
> >> --- a/fs/nfs/direct.c
> >> +++ b/fs/nfs/direct.c
> >> @@ -407,15 +407,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
> >> pos += vec->iov_len;
> >> }
> >>
> >> - if (put_dreq(dreq))
> >> - nfs_direct_complete(dreq);
> >> -
> >> - if (requested_bytes != 0)
> >> - return 0;
> >> + /*
> >> + * If no bytes were started, return the error, and let the
> >> + * generic layer handle the completion.
> >> + */
> >> + if (requested_bytes == 0)
> >> + return result < 0 ? result : -EIO;
> >>
> >> - if (result < 0)
> >> - return result;
> >> - return -EIO;
> >> + if (put_dreq(dreq))
> >> + nfs_direct_write_complete(dreq, dreq->inode);
> > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > In nfs_direct_read_schedule_iovec()? Shouldn't that be
> > nfs_direct_complete(dreq);
> >
> > Also, why is EIO the correct reply when no bytes were read/written? Why
> > shouldn't the VFS aio code be able to cope with a zero byte reply?
>
> What would it do?

Just return that zero byte reply to userland.

zero bytes is a valid reply for ordinary read() and write(), so why
should we have to do anything different for aio_read()/aio_write()?

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-19 23:49:40

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, 2011-01-20 at 10:37 +1100, Nick Piggin wrote:
> On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
> <[email protected]> wrote:
> > On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
> >> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
>
> >> > Also, why is EIO the correct reply when no bytes were read/written? Why
> >> > shouldn't the VFS aio code be able to cope with a zero byte reply?
> >>
> >> What would it do?
> >
> > Just return that zero byte reply to userland.
> >
> > zero bytes is a valid reply for ordinary read() and write(), so why
> > should we have to do anything different for aio_read()/aio_write()?
>
> It doesn't give userspace much to do. zero reply from read means
> EOF. Zero reply from write is pretty useless, I don't think we do it
> in the buffered write path -- we either ensure we write at least
> something or have a meaningful error to return.

zero reply from read means EOF _or_ user supplied a zero length buffer.

zero reply from write may also be useless, but it is a valid value. It
can simply mean the user supplied a zero length buffer.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-19 23:40:49

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Jan 19, 2011, at 6:37 PM, Nick Piggin wrote:

> On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
> <[email protected]> wrote:
>> On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
>>> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
>
>>>> Also, why is EIO the correct reply when no bytes were read/written? Why
>>>> shouldn't the VFS aio code be able to cope with a zero byte reply?
>>>
>>> What would it do?
>>
>> Just return that zero byte reply to userland.
>>
>> zero bytes is a valid reply for ordinary read() and write(), so why
>> should we have to do anything different for aio_read()/aio_write()?
>
> It doesn't give userspace much to do. zero reply from read means
> EOF. Zero reply from write is pretty useless, I don't think we do it
> in the buffered write path -- we either ensure we write at least
> something or have a meaningful error to return.

I think in this case, the zero-length requests are already shunted off. No zero-length requests make it down here, IIRC. So we expect that either some bytes are started, or an error occurs. If zero bytes were started and no error occurred, that's just... wrong.

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2011-01-19 23:37:38

by Nicholas Piggin

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
<[email protected]> wrote:
> On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
>> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust

>> > Also, why is EIO the correct reply when no bytes were read/written? Why
>> > shouldn't the VFS aio code be able to cope with a zero byte reply?
>>
>> What would it do?
>
> Just return that zero byte reply to userland.
>
> zero bytes is a valid reply for ordinary read() and write(), so why
> should we have to do anything different for aio_read()/aio_write()?

It doesn't give userspace much to do. zero reply from read means
EOF. Zero reply from write is pretty useless, I don't think we do it
in the buffered write path -- we either ensure we write at least
something or have a meaningful error to return.

2011-01-19 23:42:46

by Nicholas Piggin

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, Jan 20, 2011 at 10:39 AM, Chuck Lever <[email protected]> wrote:
>
> On Jan 19, 2011, at 6:37 PM, Nick Piggin wrote:
>
>> On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
>>>> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
>>
>>>>> Also, why is EIO the correct reply when no bytes were read/written? Why
>>>>> shouldn't the VFS aio code be able to cope with a zero byte reply?
>>>>
>>>> What would it do?
>>>
>>> Just return that zero byte reply to userland.
>>>
>>> zero bytes is a valid reply for ordinary read() and write(), so why
>>> should we have to do anything different for aio_read()/aio_write()?
>>
>> It doesn't give userspace much to do. zero reply from read means
>> EOF. Zero reply from write is pretty useless, I don't think we do it
>> in the buffered write path -- we either ensure we write at least
>> something or have a meaningful error to return.
>
> I think in this case, the zero-length requests are already shunted off. ?No zero-length requests make it down here, IIRC. ?So we expect that either some bytes are started, or an error occurs. ?If zero bytes were started and no error occurred, that's just... wrong.

Right. So in this case you should have an error to return.

2011-01-19 23:32:34

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Jan 19, 2011, at 6:25 PM, Trond Myklebust wrote:

> On Wed, 2011-01-19 at 17:36 -0500, Chuck Lever wrote:
>> Nick Piggin reports:
>>
>>> I'm getting use after frees in aio code in NFS
>>>
>>> [ 2703.396766] Call Trace:
>>> [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
>>> [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
>>> [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
>>> [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
>>> [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
>>> [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
>>> [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
>>> [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
>>> [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60
>>> [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
>>> [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
>>> [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
>>>
>>> Adding some tracing, it is due to nfs completing the request then
>>> returning something other than -EIOCBQUEUED, so aio.c
>>> also completes the request.
>>
>> To address this, prevent the NFS direct I/O engine from completing
>> async iocbs when the forward path returns an error other than
>> EIOCBQUEUED.
>>
>> This appears to survive ^C during both "xfstest no. 208" and "fsx -Z."
>>
>> Cc: Stable <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>>
>> Here's my take.
>>
>> fs/nfs/direct.c | 32 +++++++++++++++++---------------
>> 1 files changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index e6ace0d..c2176f4 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -407,15 +407,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
>> pos += vec->iov_len;
>> }
>>
>> - if (put_dreq(dreq))
>> - nfs_direct_complete(dreq);
>> -
>> - if (requested_bytes != 0)
>> - return 0;
>> + /*
>> + * If no bytes were started, return the error, and let the
>> + * generic layer handle the completion.
>> + */
>> + if (requested_bytes == 0)
>> + return result < 0 ? result : -EIO;
>>
>> - if (result < 0)
>> - return result;
>> - return -EIO;
>> + if (put_dreq(dreq))
>> + nfs_direct_write_complete(dreq, dreq->inode);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> In nfs_direct_read_schedule_iovec()? Shouldn't that be
> nfs_direct_complete(dreq);

Yes, copy and paste error. Thanks for the review.

> Also, why is EIO the correct reply when no bytes were read/written? Why
> shouldn't the VFS aio code be able to cope with a zero byte reply?

-EIO is returned only if no other error code has been provided so far. If no bytes were transferred, normally there is going to be some error code generated by the lower layer. Should we make this a BUG instead?

A zero-byte write is allowed, of course, but I think that kind of request is shunted off before we arrive here.

>
>> + return 0;
>> }
>>
>> static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
>> @@ -841,15 +842,16 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
>> pos += vec->iov_len;
>> }
>>
>> + /*
>> + * If no bytes were started, return the error, and let the
>> + * generic layer handle the completion.
>> + */
>> + if (requested_bytes == 0)
>> + return result < 0 ? result : -EIO;
>> +
>> if (put_dreq(dreq))
>> nfs_direct_write_complete(dreq, dreq->inode);
>> -
>> - if (requested_bytes != 0)
>> - return 0;
>> -
>> - if (result < 0)
>> - return result;
>> - return -EIO;
>> + return 0;
>> }
>>
>> static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> Trond Myklebust
> Linux NFS client maintainer
>
> NetApp
> [email protected]
> http://www.netapp.com
>

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

2011-01-19 23:50:49

by Nicholas Piggin

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, Jan 20, 2011 at 10:49 AM, Trond Myklebust
<[email protected]> wrote:
> On Thu, 2011-01-20 at 10:37 +1100, Nick Piggin wrote:
>> On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
>> <[email protected]> wrote:
>> > On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
>> >> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
>>
>> >> > Also, why is EIO the correct reply when no bytes were read/written? Why
>> >> > shouldn't the VFS aio code be able to cope with a zero byte reply?
>> >>
>> >> What would it do?
>> >
>> > Just return that zero byte reply to userland.
>> >
>> > zero bytes is a valid reply for ordinary read() and write(), so why
>> > should we have to do anything different for aio_read()/aio_write()?
>>
>> It doesn't give userspace much to do. zero reply from read means
>> EOF. Zero reply from write is pretty useless, I don't think we do it
>> in the buffered write path -- we either ensure we write at least
>> something or have a meaningful error to return.
>
> zero reply from read means EOF _or_ user supplied a zero length buffer.
>
> zero reply from write may also be useless, but it is a valid value. It
> can simply mean the user supplied a zero length buffer.

OK, yes. I'm ignoring zero length request.

2011-01-19 23:57:07

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, 2011-01-20 at 10:42 +1100, Nick Piggin wrote:
> On Thu, Jan 20, 2011 at 10:39 AM, Chuck Lever <[email protected]> wrote:
> >
> > On Jan 19, 2011, at 6:37 PM, Nick Piggin wrote:
> >
> >> On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
> >> <[email protected]> wrote:
> >>> On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
> >>>> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
> >>
> >>>>> Also, why is EIO the correct reply when no bytes were read/written? Why
> >>>>> shouldn't the VFS aio code be able to cope with a zero byte reply?
> >>>>
> >>>> What would it do?
> >>>
> >>> Just return that zero byte reply to userland.
> >>>
> >>> zero bytes is a valid reply for ordinary read() and write(), so why
> >>> should we have to do anything different for aio_read()/aio_write()?
> >>
> >> It doesn't give userspace much to do. zero reply from read means
> >> EOF. Zero reply from write is pretty useless, I don't think we do it
> >> in the buffered write path -- we either ensure we write at least
> >> something or have a meaningful error to return.
> >
> > I think in this case, the zero-length requests are already shunted off. No zero-length requests make it down here, IIRC. So we expect that either some bytes are started, or an error occurs. If zero bytes were started and no error occurred, that's just... wrong.
>
> Right. So in this case you should have an error to return.

So, which error??? What is going wrong that ensures that we don't start
any I/O. Is it the pages that are failing to fault in, is it the RPC
call that is failing to start (if so, why)?

I don't like to be returning EIO if a more specific (and useful) error
exists.

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-19 23:26:58

by Nicholas Piggin

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
<[email protected]> wrote:
> On Wed, 2011-01-19 at 17:36 -0500, Chuck Lever wrote:
>> Nick Piggin reports:
>>
>> > I'm getting use after frees in aio code in NFS
>> >
>> > [ 2703.396766] Call Trace:
>> > [ 2703.396858] ?[<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
>> > [ 2703.396959] ?[<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
>> > [ 2703.397058] ?[<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
>> > [ 2703.397159] ?[<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
>> > [ 2703.397260] ?[<ffffffff811627db>] ? aio_put_req+0x2b/0x60
>> > [ 2703.397361] ?[<ffffffff81039701>] ? get_parent_ip+0x11/0x50
>> > [ 2703.397464] ?[<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
>> > [ 2703.397564] ?[<ffffffff811627db>] ? aio_put_req+0x2b/0x60
>> > [ 2703.397662] ?[<ffffffff811627db>] aio_put_req+0x2b/0x60
>> > [ 2703.397761] ?[<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
>> > [ 2703.397895] ?[<ffffffff81164d0b>] sys_io_submit+0xb/0x10
>> > [ 2703.397995] ?[<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
>> >
>> > Adding some tracing, it is due to nfs completing the request then
>> > returning something other than -EIOCBQUEUED, so aio.c
>> > also completes the request.
>>
>> To address this, prevent the NFS direct I/O engine from completing
>> async iocbs when the forward path returns an error other than
>> EIOCBQUEUED.
>>
>> This appears to survive ^C during both "xfstest no. 208" and "fsx -Z."
>>
>> Cc: Stable <[email protected]>
>> Signed-off-by: Chuck Lever <[email protected]>
>> ---
>>
>> Here's my take.
>>
>> ?fs/nfs/direct.c | ? 32 +++++++++++++++++---------------
>> ?1 files changed, 17 insertions(+), 15 deletions(-)
>>
>> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
>> index e6ace0d..c2176f4 100644
>> --- a/fs/nfs/direct.c
>> +++ b/fs/nfs/direct.c
>> @@ -407,15 +407,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
>> ? ? ? ? ? ? ? pos += vec->iov_len;
>> ? ? ? }
>>
>> - ? ? if (put_dreq(dreq))
>> - ? ? ? ? ? ? nfs_direct_complete(dreq);
>> -
>> - ? ? if (requested_bytes != 0)
>> - ? ? ? ? ? ? return 0;
>> + ? ? /*
>> + ? ? ?* If no bytes were started, return the error, and let the
>> + ? ? ?* generic layer handle the completion.
>> + ? ? ?*/
>> + ? ? if (requested_bytes == 0)
>> + ? ? ? ? ? ? return result < 0 ? result : -EIO;
>>
>> - ? ? if (result < 0)
>> - ? ? ? ? ? ? return result;
>> - ? ? return -EIO;
>> + ? ? if (put_dreq(dreq))
>> + ? ? ? ? ? ? nfs_direct_write_complete(dreq, dreq->inode);
> ? ? ? ? ? ? ? ? ?^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> In nfs_direct_read_schedule_iovec()? Shouldn't that be
> ? ? ? ? ? ? ? ?nfs_direct_complete(dreq);
>
> Also, why is EIO the correct reply when no bytes were read/written? Why
> shouldn't the VFS aio code be able to cope with a zero byte reply?

What would it do?

2011-01-19 23:25:56

by Myklebust, Trond

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Wed, 2011-01-19 at 17:36 -0500, Chuck Lever wrote:
> Nick Piggin reports:
>
> > I'm getting use after frees in aio code in NFS
> >
> > [ 2703.396766] Call Trace:
> > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60
> > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
> >
> > Adding some tracing, it is due to nfs completing the request then
> > returning something other than -EIOCBQUEUED, so aio.c
> > also completes the request.
>
> To address this, prevent the NFS direct I/O engine from completing
> async iocbs when the forward path returns an error other than
> EIOCBQUEUED.
>
> This appears to survive ^C during both "xfstest no. 208" and "fsx -Z."
>
> Cc: Stable <[email protected]>
> Signed-off-by: Chuck Lever <[email protected]>
> ---
>
> Here's my take.
>
> fs/nfs/direct.c | 32 +++++++++++++++++---------------
> 1 files changed, 17 insertions(+), 15 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index e6ace0d..c2176f4 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -407,15 +407,16 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
> pos += vec->iov_len;
> }
>
> - if (put_dreq(dreq))
> - nfs_direct_complete(dreq);
> -
> - if (requested_bytes != 0)
> - return 0;
> + /*
> + * If no bytes were started, return the error, and let the
> + * generic layer handle the completion.
> + */
> + if (requested_bytes == 0)
> + return result < 0 ? result : -EIO;
>
> - if (result < 0)
> - return result;
> - return -EIO;
> + if (put_dreq(dreq))
> + nfs_direct_write_complete(dreq, dreq->inode);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In nfs_direct_read_schedule_iovec()? Shouldn't that be
nfs_direct_complete(dreq);

Also, why is EIO the correct reply when no bytes were read/written? Why
shouldn't the VFS aio code be able to cope with a zero byte reply?

> + return 0;
> }
>
> static ssize_t nfs_direct_read(struct kiocb *iocb, const struct iovec *iov,
> @@ -841,15 +842,16 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
> pos += vec->iov_len;
> }
>
> + /*
> + * If no bytes were started, return the error, and let the
> + * generic layer handle the completion.
> + */
> + if (requested_bytes == 0)
> + return result < 0 ? result : -EIO;
> +
> if (put_dreq(dreq))
> nfs_direct_write_complete(dreq, dreq->inode);
> -
> - if (requested_bytes != 0)
> - return 0;
> -
> - if (result < 0)
> - return result;
> - return -EIO;
> + return 0;
> }
>
> static ssize_t nfs_direct_write(struct kiocb *iocb, const struct iovec *iov,
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Trond Myklebust
Linux NFS client maintainer

NetApp
[email protected]
http://www.netapp.com

2011-01-19 23:58:29

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH] NFS: Fix "BUG at fs/aio.c:554!"

On Jan 19, 2011, at 6:49 PM, Trond Myklebust wrote:

> On Thu, 2011-01-20 at 10:37 +1100, Nick Piggin wrote:
>> On Thu, Jan 20, 2011 at 10:31 AM, Trond Myklebust
>> <[email protected]> wrote:
>>> On Thu, 2011-01-20 at 10:26 +1100, Nick Piggin wrote:
>>>> On Thu, Jan 20, 2011 at 10:25 AM, Trond Myklebust
>>
>>>>> Also, why is EIO the correct reply when no bytes were read/written? Why
>>>>> shouldn't the VFS aio code be able to cope with a zero byte reply?
>>>>
>>>> What would it do?
>>>
>>> Just return that zero byte reply to userland.
>>>
>>> zero bytes is a valid reply for ordinary read() and write(), so why
>>> should we have to do anything different for aio_read()/aio_write()?
>>
>> It doesn't give userspace much to do. zero reply from read means
>> EOF. Zero reply from write is pretty useless, I don't think we do it
>> in the buffered write path -- we either ensure we write at least
>> something or have a meaningful error to return.
>
> zero reply from read means EOF _or_ user supplied a zero length buffer.
>
> zero reply from write may also be useless, but it is a valid value. It
> can simply mean the user supplied a zero length buffer.

This code is in the forward path. So, here we are just dealing with starting the I/O. If the server replies with a short read or write, that's handled in the callbacks, not here.

So, -EIO is appropriate if we couldn't even start the I/O, right?

Anyway, this is copied from the existing code, not something new with this patch. If we want to change this part of the logic, maybe it should be part of a different patch?

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com