2014-10-03 18:06:39

by Kent Overstreet

[permalink] [raw]
Subject: [PATCH] aio: Fix return code of io_submit() (RFC)

io_submit() could return -EAGAIN on memory allocation failure when it should
really have been returning -ENOMEM. This could confuse applications (i.e. fio)
since -EAGAIN means "too many requests outstanding, wait until completions have
been reaped" and if the application actually was tracking outstanding
completions this wouldn't make a lot of sense.

NOTE:

the man page seems to imply that the current behaviour (-EAGAIN on allocation
failure) has always been the case. I don't think it makes a lot of sense, but
this should probably be discussed more widely in case applications have somehow
come to rely on the current behaviour...

Signed-off-by: Kent Overstreet <[email protected]>
Cc: Benjamin LaHaise <[email protected]>
Cc: Zach Brown <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Slava Pestov <[email protected]>
---
fs/aio.c | 26 +++++++++++++-------------
1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/aio.c b/fs/aio.c
index 733750096b..556547044b 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -933,23 +933,14 @@ static inline struct kiocb *aio_get_req(struct kioctx *ctx)
{
struct kiocb *req;

- if (!get_reqs_available(ctx)) {
- user_refill_reqs_available(ctx);
- if (!get_reqs_available(ctx))
- return NULL;
- }
-
req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
if (unlikely(!req))
- goto out_put;
+ return NULL;

percpu_ref_get(&ctx->reqs);

req->ki_ctx = ctx;
return req;
-out_put:
- put_reqs_available(ctx, 1);
- return NULL;
}

static void kiocb_free(struct kiocb *req)
@@ -1489,9 +1480,17 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
return -EINVAL;
}

+ if (!get_reqs_available(ctx)) {
+ user_refill_reqs_available(ctx);
+ if (!get_reqs_available(ctx))
+ return -EAGAIN;
+ }
+
req = aio_get_req(ctx);
- if (unlikely(!req))
- return -EAGAIN;
+ if (unlikely(!req)) {
+ ret = -ENOMEM;
+ goto out_put;
+ }

req->ki_filp = fget(iocb->aio_fildes);
if (unlikely(!req->ki_filp)) {
@@ -1533,9 +1532,10 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,

return 0;
out_put_req:
- put_reqs_available(ctx, 1);
percpu_ref_put(&ctx->reqs);
kiocb_free(req);
+out_put:
+ put_reqs_available(ctx, 1);
return ret;
}

--
2.1.1


2014-10-03 18:13:45

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On 2014-10-03 12:08, Kent Overstreet wrote:
> io_submit() could return -EAGAIN on memory allocation failure when it should
> really have been returning -ENOMEM. This could confuse applications (i.e. fio)
> since -EAGAIN means "too many requests outstanding, wait until completions have
> been reaped" and if the application actually was tracking outstanding
> completions this wouldn't make a lot of sense.
>
> NOTE:
>
> the man page seems to imply that the current behaviour (-EAGAIN on allocation
> failure) has always been the case. I don't think it makes a lot of sense, but
> this should probably be discussed more widely in case applications have somehow
> come to rely on the current behaviour...

We can't really feasibly fix this, is my worry. Fio does track the
pending requests and does not get into a getevents() forever wait if it
gets -EAGAIN on submission. But before the fix, it would loop forever in
submission in -EAGAIN.

How are applications supposed to deal with ENOMEM? I think the answer
here is that they can't, it would be a fatal condition. AIO must provide
isn't own guarantee of progress, with a mempool or similar.


--
Jens Axboe

2014-10-03 18:19:25

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

Hi Kent,

On Fri, Oct 03, 2014 at 11:08:13AM -0700, Kent Overstreet wrote:
> io_submit() could return -EAGAIN on memory allocation failure when it should
> really have been returning -ENOMEM. This could confuse applications (i.e. fio)
> since -EAGAIN means "too many requests outstanding, wait until completions have
> been reaped" and if the application actually was tracking outstanding
> completions this wouldn't make a lot of sense.

Wouldn't it be simpler to just return an ERR_PTR with the appropriate
return code rather than move all that code around?

-ben

> NOTE:
>
> the man page seems to imply that the current behaviour (-EAGAIN on allocation
> failure) has always been the case. I don't think it makes a lot of sense, but
> this should probably be discussed more widely in case applications have somehow
> come to rely on the current behaviour...
>
> Signed-off-by: Kent Overstreet <[email protected]>
> Cc: Benjamin LaHaise <[email protected]>
> Cc: Zach Brown <[email protected]>
> Cc: Jeff Moyer <[email protected]>
> Cc: Jens Axboe <[email protected]>
> Cc: Slava Pestov <[email protected]>
> ---
> fs/aio.c | 26 +++++++++++++-------------
> 1 file changed, 13 insertions(+), 13 deletions(-)
>
> diff --git a/fs/aio.c b/fs/aio.c
> index 733750096b..556547044b 100644
> --- a/fs/aio.c
> +++ b/fs/aio.c
> @@ -933,23 +933,14 @@ static inline struct kiocb *aio_get_req(struct kioctx *ctx)
> {
> struct kiocb *req;
>
> - if (!get_reqs_available(ctx)) {
> - user_refill_reqs_available(ctx);
> - if (!get_reqs_available(ctx))
> - return NULL;
> - }
> -
> req = kmem_cache_alloc(kiocb_cachep, GFP_KERNEL|__GFP_ZERO);
> if (unlikely(!req))
> - goto out_put;
> + return NULL;
>
> percpu_ref_get(&ctx->reqs);
>
> req->ki_ctx = ctx;
> return req;
> -out_put:
> - put_reqs_available(ctx, 1);
> - return NULL;
> }
>
> static void kiocb_free(struct kiocb *req)
> @@ -1489,9 +1480,17 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
> return -EINVAL;
> }
>
> + if (!get_reqs_available(ctx)) {
> + user_refill_reqs_available(ctx);
> + if (!get_reqs_available(ctx))
> + return -EAGAIN;
> + }
> +
> req = aio_get_req(ctx);
> - if (unlikely(!req))
> - return -EAGAIN;
> + if (unlikely(!req)) {
> + ret = -ENOMEM;
> + goto out_put;
> + }
>
> req->ki_filp = fget(iocb->aio_fildes);
> if (unlikely(!req->ki_filp)) {
> @@ -1533,9 +1532,10 @@ static int io_submit_one(struct kioctx *ctx, struct iocb __user *user_iocb,
>
> return 0;
> out_put_req:
> - put_reqs_available(ctx, 1);
> percpu_ref_put(&ctx->reqs);
> kiocb_free(req);
> +out_put:
> + put_reqs_available(ctx, 1);
> return ret;
> }
>
> --
> 2.1.1

--
"Thought is the essence of where you are now."

2014-10-03 18:21:57

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On Fri, Oct 03, 2014 at 12:13:39PM -0600, Jens Axboe wrote:
> On 2014-10-03 12:08, Kent Overstreet wrote:
> >io_submit() could return -EAGAIN on memory allocation failure when it should
> >really have been returning -ENOMEM. This could confuse applications (i.e. fio)
> >since -EAGAIN means "too many requests outstanding, wait until completions have
> >been reaped" and if the application actually was tracking outstanding
> >completions this wouldn't make a lot of sense.
> >
> >NOTE:
> >
> >the man page seems to imply that the current behaviour (-EAGAIN on allocation
> >failure) has always been the case. I don't think it makes a lot of sense, but
> >this should probably be discussed more widely in case applications have somehow
> >come to rely on the current behaviour...
>
> We can't really feasibly fix this, is my worry. Fio does track the pending
> requests and does not get into a getevents() forever wait if it gets -EAGAIN
> on submission. But before the fix, it would loop forever in submission in
> -EAGAIN.
>
> How are applications supposed to deal with ENOMEM? I think the answer here
> is that they can't, it would be a fatal condition. AIO must provide isn't
> own guarantee of progress, with a mempool or similar.

Well, even though the AIO code doesn't currently return -ENOMEM we definitely do
have random other driver/filesystem code that will return -ENOMEM if a random
GFP_KERNEL allocation fails (e.g. the dio code, if allocating a struct dio
fails). So I think there's precedent for this, and having it be a fatal error
when the system is under major memory pressure is not a crazy thing to do too.

But OTOH maybe we should just use a mempool there.

The argument against making it a mempool would be "we don't want io_submit() to
block; even if that's not the case today, we at least have a chance of fixing it
with the current setup. If we can't allocate memory for our asynchronous state,
we really can't do anything there except block or fail".

I'm not sure I have strong feelings one way or the other.

2014-10-03 18:22:23

by Benjamin LaHaise

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On Fri, Oct 03, 2014 at 12:13:39PM -0600, Jens Axboe wrote:
> On 2014-10-03 12:08, Kent Overstreet wrote:
> >io_submit() could return -EAGAIN on memory allocation failure when it
> >should
> >really have been returning -ENOMEM. This could confuse applications (i.e.
> >fio)
> >since -EAGAIN means "too many requests outstanding, wait until completions
> >have
> >been reaped" and if the application actually was tracking outstanding
> >completions this wouldn't make a lot of sense.
> >
> >NOTE:
> >
> >the man page seems to imply that the current behaviour (-EAGAIN on
> >allocation
> >failure) has always been the case. I don't think it makes a lot of sense,
> >but
> >this should probably be discussed more widely in case applications have
> >somehow
> >come to rely on the current behaviour...
>
> We can't really feasibly fix this, is my worry. Fio does track the
> pending requests and does not get into a getevents() forever wait if it
> gets -EAGAIN on submission. But before the fix, it would loop forever in
> submission in -EAGAIN.

There are lots of instances in the kernel where out of memory is potentially
exposed to the user. If we're failing a memory allocation that is well under
1KB, the system is probably completely hosed.

> How are applications supposed to deal with ENOMEM? I think the answer
> here is that they can't, it would be a fatal condition. AIO must provide
> isn't own guarantee of progress, with a mempool or similar.

I'm not sure if using a mempool is appropriate for allocations that are
driven by userland code. At least with an ENOMEM error, an application
could free up some of the memory it allocated and possibly recover the
system.

-ben
--
"Thought is the essence of where you are now."

2014-10-03 18:25:29

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On 2014-10-03 12:21, Kent Overstreet wrote:
> On Fri, Oct 03, 2014 at 12:13:39PM -0600, Jens Axboe wrote:
>> On 2014-10-03 12:08, Kent Overstreet wrote:
>>> io_submit() could return -EAGAIN on memory allocation failure when it should
>>> really have been returning -ENOMEM. This could confuse applications (i.e. fio)
>>> since -EAGAIN means "too many requests outstanding, wait until completions have
>>> been reaped" and if the application actually was tracking outstanding
>>> completions this wouldn't make a lot of sense.
>>>
>>> NOTE:
>>>
>>> the man page seems to imply that the current behaviour (-EAGAIN on allocation
>>> failure) has always been the case. I don't think it makes a lot of sense, but
>>> this should probably be discussed more widely in case applications have somehow
>>> come to rely on the current behaviour...
>>
>> We can't really feasibly fix this, is my worry. Fio does track the pending
>> requests and does not get into a getevents() forever wait if it gets -EAGAIN
>> on submission. But before the fix, it would loop forever in submission in
>> -EAGAIN.
>>
>> How are applications supposed to deal with ENOMEM? I think the answer here
>> is that they can't, it would be a fatal condition. AIO must provide isn't
>> own guarantee of progress, with a mempool or similar.
>
> Well, even though the AIO code doesn't currently return -ENOMEM we definitely do
> have random other driver/filesystem code that will return -ENOMEM if a random
> GFP_KERNEL allocation fails (e.g. the dio code, if allocating a struct dio
> fails). So I think there's precedent for this, and having it be a fatal error
> when the system is under major memory pressure is not a crazy thing to do too.
>
> But OTOH maybe we should just use a mempool there.
>
> The argument against making it a mempool would be "we don't want io_submit() to
> block; even if that's not the case today, we at least have a chance of fixing it
> with the current setup. If we can't allocate memory for our asynchronous state,
> we really can't do anything there except block or fail".

It'll block anyway in other places, if we run out of resources there.
But good point on the other potential -ENOMEM cases, it's not a new
condition (potentially).

> I'm not sure I have strong feelings one way or the other.

Me neither...

--
Jens Axboe

2014-10-03 18:31:52

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On Fri, Oct 03, 2014 at 02:22:20PM -0400, Benjamin LaHaise wrote:
> On Fri, Oct 03, 2014 at 12:13:39PM -0600, Jens Axboe wrote:
> > How are applications supposed to deal with ENOMEM? I think the answer
> > here is that they can't, it would be a fatal condition. AIO must provide
> > isn't own guarantee of progress, with a mempool or similar.
>
> I'm not sure if using a mempool is appropriate for allocations that are
> driven by userland code. At least with an ENOMEM error, an application
> could free up some of the memory it allocated and possibly recover the
> system.

I guess it's going to depend on the application... some applications really want
to always make forward progress (much like a lot of code in the kernel), so
they're going to want the mempool semantics and we in the kernel are in a much
better position to implement that correctly (think of all the applications that
are just going to sleep and retry on -ENOMEM).

we kind of want another flag in the syscall args that's the moral equivalent of
MSG_DONTWAIT but for memory allocations; it'd translate into "mempool +
GFP_KERNEL, or GFP_NOWAIT".

not that I'm actually going to implement that :)

2014-10-03 18:36:15

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On 2014-10-03 12:22, Benjamin LaHaise wrote:
> On Fri, Oct 03, 2014 at 12:13:39PM -0600, Jens Axboe wrote:
>> On 2014-10-03 12:08, Kent Overstreet wrote:
>>> io_submit() could return -EAGAIN on memory allocation failure when it
>>> should
>>> really have been returning -ENOMEM. This could confuse applications (i.e.
>>> fio)
>>> since -EAGAIN means "too many requests outstanding, wait until completions
>>> have
>>> been reaped" and if the application actually was tracking outstanding
>>> completions this wouldn't make a lot of sense.
>>>
>>> NOTE:
>>>
>>> the man page seems to imply that the current behaviour (-EAGAIN on
>>> allocation
>>> failure) has always been the case. I don't think it makes a lot of sense,
>>> but
>>> this should probably be discussed more widely in case applications have
>>> somehow
>>> come to rely on the current behaviour...
>>
>> We can't really feasibly fix this, is my worry. Fio does track the
>> pending requests and does not get into a getevents() forever wait if it
>> gets -EAGAIN on submission. But before the fix, it would loop forever in
>> submission in -EAGAIN.
>
> There are lots of instances in the kernel where out of memory is potentially
> exposed to the user. If we're failing a memory allocation that is well under
> 1KB, the system is probably completely hosed.
>
>> How are applications supposed to deal with ENOMEM? I think the answer
>> here is that they can't, it would be a fatal condition. AIO must provide
>> isn't own guarantee of progress, with a mempool or similar.
>
> I'm not sure if using a mempool is appropriate for allocations that are
> driven by userland code. At least with an ENOMEM error, an application
> could free up some of the memory it allocated and possibly recover the
> system.

Since fio just hit this, it has nothing it can potentially free to make
progress possible. There was no pending IO, so all it can do is quit.
But I do agree that if a small alloc like that fails, then we are
probably pretty darn screwed anyway, and it doesn't matter that much
what we do. My main concern was a potential change in the ABI, but since
we could already return -ENOMEM from other cases, that is probably a
moot point.

--
Jens Axboe

2014-10-03 18:39:26

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aio: Fix return code of io_submit() (RFC)

On 2014-10-03 12:31, Kent Overstreet wrote:
> On Fri, Oct 03, 2014 at 02:22:20PM -0400, Benjamin LaHaise wrote:
>> On Fri, Oct 03, 2014 at 12:13:39PM -0600, Jens Axboe wrote:
>>> How are applications supposed to deal with ENOMEM? I think the answer
>>> here is that they can't, it would be a fatal condition. AIO must provide
>>> isn't own guarantee of progress, with a mempool or similar.
>>
>> I'm not sure if using a mempool is appropriate for allocations that are
>> driven by userland code. At least with an ENOMEM error, an application
>> could free up some of the memory it allocated and possibly recover the
>> system.
>
> I guess it's going to depend on the application... some applications really want
> to always make forward progress (much like a lot of code in the kernel), so
> they're going to want the mempool semantics and we in the kernel are in a much
> better position to implement that correctly (think of all the applications that
> are just going to sleep and retry on -ENOMEM).

Precisely, there's no real way to do that in the application. Especially
if it has no pending IO it can just wait on, it'll be a sleep and retry
thing

> we kind of want another flag in the syscall args that's the moral equivalent of
> MSG_DONTWAIT but for memory allocations; it'd translate into "mempool +
> GFP_KERNEL, or GFP_NOWAIT".

We do...

> not that I'm actually going to implement that :)

It's worth keeping in mind for if we do extend the API for some reason.

--
Jens Axboe