Andrew,
Here is a patch to fix EINVAL handling in io_submit_one() that was
causing a process hang when attempting AIO to a device not able
to handle aio. I hit this doing a AIO read from /dev/zero. The
process would hang on exit in wait_for_all_aios(). The fix is
to check for EINVAL coming back from aio_setup_iocb() in addition
to the EFAULT and EBADF already there. This causes the io_submit
to fail with EINVAL. That check looks error prone.
Are there other error return values where it should jump to the
aio_put_req()? Should the check be:
if (ret != 0 && ret != -EIOCBQUEUED)
goto out_put_req;
Thanks,
Daniel McNeil <[email protected]>
Daniel McNeil wrote:
>Are there other error return values where it should jump to the
>aio_put_req()? Should the check be:
>
>
The situation is much worse. The io_submit_one() code in 2.5.71
distinguishes between conditions where io_submit should fail (which goto
out_put_req) and conditions where the queued operation completes
immediately (which result in a call to aio_complete()). The patch in
2.5.71-mm1 which separates out aio_setup_iocb() loses track of this
distinction, mishandling any case where the queued operation completes
immediately. Aio poll, for instance, depends on being able to indicate
immediate completion.
So the part of aio-01-retry.patch that splits out aio_setup_iocb() is
completely broken.
On Mon, Jun 16, 2003 at 06:33:13PM -0700, John Myers wrote:
>
> Daniel McNeil wrote:
>
> >Are there other error return values where it should jump to the
> >aio_put_req()? Should the check be:
> >
> >
> The situation is much worse. The io_submit_one() code in 2.5.71
> distinguishes between conditions where io_submit should fail (which goto
> out_put_req) and conditions where the queued operation completes
> immediately (which result in a call to aio_complete()). The patch in
> 2.5.71-mm1 which separates out aio_setup_iocb() loses track of this
> distinction, mishandling any case where the queued operation completes
> immediately. Aio poll, for instance, depends on being able to indicate
> immediate completion.
The code for aio_read/write does distinguish between these cases.
- if you spot a case where it doesn't do let me know.
aio_setup_iocb() just sets up the method after performing the
specified checks. Its aio_run_iocb() which actually executes it.
>
> So the part of aio-01-retry.patch that splits out aio_setup_iocb() is
> completely broken.
>
Actually, looking closer, I think its just aio poll that's
incorrectly merged here. The right way to implement aio poll in
the new model would have been to setup a retry method for it
in aio_setup_iocb(), not run generic_aio_poll() directly there.
Regards
Suparna
--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Labs, India
Suparna Bhattacharya wrote:
>The right way to implement aio poll in
>the new model would have been to setup a retry method for it
>in aio_setup_iocb(), not run generic_aio_poll() directly there.
>
>
That seems unnecessarily indirect and complicated. It also adds
additional cases for the cancellation path, furthur adding to
complexity. The aio poll mechanism is perfectly capable of arranging
callbacks itself.
The check should be:
if (-EIOCBQUEUED == ret)
return 0;
if (unlikely(ret))
goto out_put_req;
Then handling of immediate completion should be inside either
aio_setup_iocb() or generic_aio_poll(), resulting in a call to
aio_complete() and a return from aio_setup_iocb() of -EIOCBQUEUED.
On another point, I certainly don't like the way aio_run_iocb()
unconditionally completes with -EINTR on cancellation. A cancelled
read/write should return the number of bytes transferred.
I have tested 2.5.72-mm1 and, as expected, it fixes the hang.
Interestingly, the mm1 code behaves differently than generic
2.5.72 -- in the error case (like aio read from /dev/zero),
2.5.72-mm1 returns an EINVAL error on io_submit(). In 2.5.72
io_submit succeeds and the i/o itself get an error. I prefer
getting the error on io_submit.
Daniel
On Mon, 2003-06-16 at 20:24, Suparna Bhattacharya wrote:
> On Mon, Jun 16, 2003 at 06:33:13PM -0700, John Myers wrote:
> >
> > Daniel McNeil wrote:
> >
> > >Are there other error return values where it should jump to the
> > >aio_put_req()? Should the check be:
> > >
> > >
> > The situation is much worse. The io_submit_one() code in 2.5.71
> > distinguishes between conditions where io_submit should fail (which goto
> > out_put_req) and conditions where the queued operation completes
> > immediately (which result in a call to aio_complete()). The patch in
> > 2.5.71-mm1 which separates out aio_setup_iocb() loses track of this
> > distinction, mishandling any case where the queued operation completes
> > immediately. Aio poll, for instance, depends on being able to indicate
> > immediate completion.
>
> The code for aio_read/write does distinguish between these cases.
> - if you spot a case where it doesn't do let me know.
> aio_setup_iocb() just sets up the method after performing the
> specified checks. Its aio_run_iocb() which actually executes it.
>
> >
> > So the part of aio-01-retry.patch that splits out aio_setup_iocb() is
> > completely broken.
> >
>
> Actually, looking closer, I think its just aio poll that's
> incorrectly merged here. The right way to implement aio poll in
> the new model would have been to setup a retry method for it
> in aio_setup_iocb(), not run generic_aio_poll() directly there.
>
> Regards
> Suparna
>
> --
> Suparna Bhattacharya ([email protected])
> Linux Technology Center
> IBM Software Labs, India
Daniel McNeil wrote:
>I prefer getting the error on io_submit.
>
>
I prefer getting the error on io_getevents(). The code that handles the
output of io_getevents() already has to handle per-operation errors.
The only thing user space can reasonably do with errors returned by
io_submit() is to retry or assert. Anything else would duplicate logic
that already has to be in and is better handled by the code that handles
the output io_getevents().
io_submit() should only return errors in cases where the caller is
obviously buggy/malicious or in cases where it is not possible to
generate an event.
On Tue, Jun 17, 2003 at 05:03:36PM -0700, John Myers wrote:
> >I prefer getting the error on io_submit.
> >
> >
> I prefer getting the error on io_getevents(). The code that handles the
POSIX 1003.1 says this about aio_read() and aio_write():
If an error condition is encountered during queuing, the function
call shall return without having initiated or queued the request.
If you intend to ever allow a POSIX wrapper to these interfaces (I have
one, for instance), you need to return EINVAL, EBADF, and the like from
io_submit(). Note that io_submit() has read-like semantics, so an
additional call may be necessary if some iocbs were successfully queued.
A user has to handle EAGAIN, so io_submit() cannot return void, and you
already have error handling logic here.
Joel
--
"Born under a bad sign.
I been down since I began to crawl.
If it wasn't for bad luck,
I wouldn't have no luck at all."
http://www.jlbec.org/
[email protected]
Joel Becker wrote:
>POSIX 1003.1 says this about aio_read() and aio_write():
> If an error condition is encountered during queuing, the function
> call shall return without having initiated or queued the request.
>
>If you intend to ever allow a POSIX wrapper to these interfaces (I have
>one, for instance), you need to return EINVAL, EBADF, and the like from
>io_submit().
>
No, you just declare that those errors happend "after queuing."
>A user has to handle EAGAIN, so io_submit() cannot return void, and you
>already have error handling logic here.
>
>
EAGAIN error handling does not require contextual information about the
operation being queued. Error handling logic that knows about the
context of the operation queued already has to exist in the
io_getevents() processing.
On Tue, Jun 17, 2003 at 05:25:09PM -0700, John Myers wrote:
> No, you just declare that those errors happend "after queuing."
But POSIX specifes that all detection that can happen before
queuing should.
> EAGAIN error handling does not require contextual information about the
> operation being queued. Error handling logic that knows about the
> context of the operation queued already has to exist in the
> io_getevents() processing.
You also now require a poll round and aditional system call to
see the errors. In addition, you waste resources until you pick up the
errorneous call.
Joel
--
Life's Little Instruction Book #237
"Seek out the good in people."
http://www.jlbec.org/
[email protected]
If the error can be determined at submissions time, it should be
returned at submission time. It is a waste for the application to have
to setup for an async completion, when the operation has *already*
failed. I think that the number of apps that would find this logic
desirable is significant enough that we should support it. *If* we
feel that enough apps would also support the "*always* return
asynchronously" methodology, then maybe we should look at conditionally
supporting both.
Actually, for the immediate return case, we should not only support
immediate return of errors, but also successful synchronous completions.
ie, libaio already calls down to the lower layer desriptor specific
code to submit the operation, and has return values for that submission.
If those return values indicate that the operation has ALREADY completed
(synchronously), it makes sense to be *able* to return that to the
application. If the application can take advantage of this synchronous
completion, it could be a nice performance win in terms of the application
not having to keep that memory reserved and not having to track the aio
operation (ie, whatever app specific context it keeps associated w/ that
operation).
There is also a precedent for async api's supporting synchronous
completions in the industry o/s' which support async io.
One question is how would the interface change to support this model.
I like the ability to support *both* methodologies. We could simply
leave the existing interface, and apply the "always return async"
semantics to it. The NEW interface could simply return an list of
io_events, like io_get_events, w/ the immediate completions.
Another possibility is to change the iocb to actually contain an
io_event, and add to it the ability to be added to a linked list.
This way we could simply return a pointer to a list of iocb's, instead
of always requiring the user to setup an array of io_events. ie, i
would prefer to only have to worry about the memory model for
*one* entity (iocb's), instead of the current *two* entities
(iocb's and io_event's). In fact, this paradigm could also be
applied to io_get_events.
Regards, -sm
John Myers wrote:
> Daniel McNeil wrote:
>
>> I prefer getting the error on io_submit.
>>
>>
> I prefer getting the error on io_getevents(). The code that handles the
> output of io_getevents() already has to handle per-operation errors.
> The only thing user space can reasonably do with errors returned by
> io_submit() is to retry or assert. Anything else would duplicate logic
> that already has to be in and is better handled by the code that handles
> the output io_getevents().
>
> io_submit() should only return errors in cases where the caller is
> obviously buggy/malicious or in cases where it is not possible to
> generate an event.
>
>
--
Scot McKinley--------------------------------------------------------
Oracle Corporation/Network Development >\\\|/<
500 Oracle Parkway, 2OP410 (o) (o) Phone:650.506.9481
Redwood Shores, CA 94065----------oOOO--(_)--OOOo-Fax :650.506.7226
Joel Becker wrote:
>But POSIX specifes that all detection that can happen before queuing should.
>
The kernel would have to be substantially more complex to report all
errors that could possibly be detected during queuing. The kernel could
even detect success during queuing if it really tried.
This is not a reasonable requirement. A correctly written program has
to be able to handle errors reported asynchronously. Why else would
they be using an asynchronous API?
> You also now require a poll round and aditional system call to
>see the errors. In addition, you waste resources until you pick up the
>errorneous call.
>
So? That is a miniscule amount of resources used by an extremely rare
condition. Such a picayune optimization hardly justifies the necessary
increase in complexity.
On Wed, Jun 18, 2003 at 05:33:27PM -0700, John Myers wrote:
> The kernel would have to be substantially more complex to report all
> errors that could possibly be detected during queuing. The kernel could
> even detect success during queuing if it really tried.
The slippery slope isn't important. POSIX specifies EAGAIN
(you concede that), EBADF, and EINVAL against nbytes|offset|reqprio.
The kernel does these checks already (where applicable).
Anyway, a caller of io_submit() already has to handle errors.
Just like a short read, you always have to be wary of them.
Joel
--
"Glory is fleeting, but obscurity is forever."
- Napoleon Bonaparte
http://www.jlbec.org/
[email protected]
Joel Becker wrote:
> The slippery slope isn't important. POSIX specifies EAGAIN (you concede that), EBADF, and EINVAL against nbytes|offset|reqprio.
>The kernel does these checks already (where applicable).
>
EAGAIN doesn't require user space to fire any per-operation callbacks or
take any per-operation action. One merely needs to retry the io_submit().
The existing EBADF and EFAULT conditions indicate buggy code. Correctly
written programs will never encounter these. An appropriate action is
to assert or otherwise shut down the entire process. Similarly for
EINVAL against the reserved fields, aio_buf, and nbytes. I'll also
grant that EINVAL against offset and reqprio would also indicate buggy code.
EINVAL caused by a fd that doesn't support the particular opcode could
be encountered by a correctly written program which is given the wrong
sort of file by a user. I would prefer such errors be reported through
io_getevents() so they can be handled by the operation's callback/event
handling code.
> The kernel would have to be substantially more complex to report all
> errors that could possibly be detected during queuing.
It doesn't have to report all of them...obviously the application CAN
handle async completions, otherwise it wouldn't be io_submit'ing aio.
However, for the one that it can report, it is a nice optimization TO
actually report them.
> The kernel could
> even detect success during queuing if it really tried.
Yes, this is also a good thing, as i mentioned in my earlier message.
ie, if the io has ALREADY completed, return it immediately.
> This is not a reasonable requirement. A correctly written program has
> to be able to handle errors reported asynchronously. Why else would
> they be using an asynchronous API?
Yes, the program WILL be able to handle async completions, obviously, since
it attempting aio submissions. However, it MAY also be able to handle
synchronous/immediate completions of its aio submissions.
> So? That is a miniscule amount of resources used by an extremely rare
> condition. Such a picayune optimization hardly justifies the necessary
> increase in complexity.
It may not be miniscule. As we expand the ability to do aio to network
descriptors, the transport that we are utilizing could easily have a
"bcopy threshold" (a bcopy thresheld, for those that don't know, is the
threshold under which all io will be done synchronously. ie, it MAY be
more efficient to actually copy the data, instead of doing async io,
if the data is small). Thus, all transfers below the bcopy threshold will
incur this extra queuing overhead (etc), even tho they have ALREADY
completed! This will be for the NORMAL io path, NOT an error condition.
Also, there is no need for the app to have keep this memory around til
the async completion is returned, if the io already completed synchronously
(either a synchronous error, or an immediate/synchronous *successful* io
operation). This memory could be substantial, if the presentation protocol
the app has to adhere to requires a substantial amount of small packets
and/or there all many connections being serviced.
Scot McKinley wrote:
> > The kernel could
> > even detect success during queuing if it really tried.
>
> Yes, this is also a good thing, as i mentioned in my earlier message.
> ie, if the io has ALREADY completed, return it immediately.
io_submit() is incapable of returning operation success notifications.
It can only return notifications of idempotent errors, since callers who
submit multiple requests might need to call io_submit() twice to find
out what the error was.
> Yes, the program WILL be able to handle async completions, obviously,
> since
> it attempting aio submissions. However, it MAY also be able to handle
> synchronous/immediate completions of its aio submissions.
"MAY" is far cry from "MUST". I object strongly to requiring all
callers to io_submit() to be able to handle immediate completions. In
my aio framework, the caller of io_submit() is not in a context where it
can invoke completion callbacks, since completion callbacks are not
required to be reentrant.
Adding an optional facility to permit callers to receive immediate
completions is a different issue.
> > So? That is a miniscule amount of resources used by an extremely rare
> > condition. Such a picayune optimization hardly justifies the necessary
> > increase in complexity.
>
> It may not be miniscule.
For the specific conditions under discussion, it was. The conditions
were certainly extremely rare.
> As we expand the ability to do aio to network
> descriptors, the transport that we are utilizing could easily have a
> "bcopy threshold" (a bcopy thresheld, for those that don't know, is the
> threshold under which all io will be done synchronously.
The traditional way of dealing with this is to first call the
synchronous nonblocking interface, retrying with the asynchronous
interface only when the nonblocking one indicates no progress. Not only
does this save the cost of the (expensive when poll-based) queuing
overhead, it can save the cost of operations required only when going
async, such as copying the data from the stack to the heap.
> io_submit() is incapable of returning operation success notifications.
Exactly, that's why i proposed a new submission interface. ie, to
allow io_submit() to support the "always return async, even IF the
operation has ALREADY completed" paradigm, and another interface
to support the "return synchronous completions on submission"
paradigm.
> "MAY" is far cry from "MUST". I object strongly to requiring all
> callers to io_submit() to be able to handle immediate completions. In
> my aio framework, the caller of io_submit() is not in a context where it
> can invoke completion callbacks, since completion callbacks are not
> required to be reentrant.
Fine (see interfaces defined above).
> For the specific conditions under discussion, it was. The conditions
> were certainly extremely rare.
Yes, and we've moved past that, since there are other conditions which
are not as rare.
> The traditional way of dealing with this is to first call the
> synchronous nonblocking interface, retrying with the asynchronous
> interface only when the nonblocking one indicates no progress.
Great...i am glad that we atleast agree that the interface is necessary,
tho maybe not on its makeup.
The issue i brought up (bcopy threshold), is not a non-blocking issue,
and the above is not the "traditional", nor the correct way of dealing w/
it. The app should NOT need to make multiple sys-calls to initiate
the io. By far the majority of the existly network aio api's simply
return an indication of the immediate/synchronous completion as a
return indication from a *single* submission routine. There is no
reason why we cannot, also.
Regards, -sm