2009-10-20 21:58:15

by Zubin Dittia

[permalink] [raw]
Subject: libaio asynchronous syscall io_getevents blocks on error

I'm writing a program that uses the kernel's io_submit/io_getevents
system calls. ?What I would like to be able to do is submit N
operations for i/o on different files, and then call io_getevents with
min_nr = nr = N and a timeout of NULL, so that I can block until all N
operations have completed. ?This works great, except when one of the
operations has an error (eg., if one of the descriptors is invalid).
In this case, the call to io_getevents appears to block indefinitely.
Shouldn't an error on one of the submitted operations count as a
completion event for that operation, so I can check the error code
when the call returns? ?Any help would be appreciated.
Thanks in advance,
-Zubin


2009-10-20 22:25:08

by Jeff Moyer

[permalink] [raw]
Subject: Re: libaio asynchronous syscall io_getevents blocks on error

Zubin Dittia <[email protected]> writes:

> I'm writing a program that uses the kernel's io_submit/io_getevents
> system calls.  What I would like to be able to do is submit N
> operations for i/o on different files, and then call io_getevents with
> min_nr = nr = N and a timeout of NULL, so that I can block until all N
> operations have completed.  This works great, except when one of the
> operations has an error (eg., if one of the descriptors is invalid).
> In this case, the call to io_getevents appears to block indefinitely.
> Shouldn't an error on one of the submitted operations count as a
> completion event for that operation, so I can check the error code
> when the call returns?  Any help would be appreciated.

Did you check the return value of io_submit?

Cheers,
Jeff

2009-10-20 23:34:02

by Zubin Dittia

[permalink] [raw]
Subject: Re: libaio asynchronous syscall io_getevents blocks on error

On Tue, Oct 20, 2009 at 3:25 PM, Jeff Moyer <[email protected]> wrote:
> Zubin Dittia <[email protected]> writes:
>
>> I'm writing a program that uses the kernel's io_submit/io_getevents
>> system calls. ?What I would like to be able to do is submit N
>> operations for i/o on different files, and then call io_getevents with
>> min_nr = nr = N and a timeout of NULL, so that I can block until all N
>> operations have completed. ?This works great, except when one of the
>> operations has an error (eg., if one of the descriptors is invalid).
>> In this case, the call to io_getevents appears to block indefinitely.
>> Shouldn't an error on one of the submitted operations count as a
>> completion event for that operation, so I can check the error code
>> when the call returns? ?Any help would be appreciated.
>
> Did you check the return value of io_submit?
>
> Cheers,
> Jeff
>


Duh. I was just checking to see if it returned a negative error, but
not checking to see if it accepted fewer than all the I/Os I
submitted.

But this does bring up the interesting question of how to know which
of the I/Os I submitted had an error, and what the error was. Does it
mean I have to call io_submit once for each I/O operation; if so, then
why does io_submit take an array argument at all?

Thanks for your help,
-Zubin

PS: It does seem a little strange that io_submit returns an error if
the first IOCB is invalid but not when any of the other IOCBs are
invalid (this appears to be the case, at least according to the man
page).

2009-10-20 23:59:04

by Jeff Moyer

[permalink] [raw]
Subject: Re: libaio asynchronous syscall io_getevents blocks on error

Zubin Dittia <[email protected]> writes:

> But this does bring up the interesting question of how to know which
> of the I/Os I submitted had an error, and what the error was. Does it

If io_submit returned N, then the N+1 iocb had an error.

> PS: It does seem a little strange that io_submit returns an error if
> the first IOCB is invalid but not when any of the other IOCBs are
> invalid (this appears to be the case, at least according to the man
> page).

It makes sense that it tells you how many it could successfully submit.
If it can't submit any, you get the error from the first I/O.

Cheers,
Jeff

2009-10-21 00:10:52

by Zubin Dittia

[permalink] [raw]
Subject: Re: libaio asynchronous syscall io_getevents blocks on error

Thanks, Jeff, this makes sense. In order to get error codes for all
I/Os that are being submitted, I'll have to loop calling io_submit on
successively smaller portions of the initial array of requests, each
time trimming off the ones that have been successfully submitted.
-Zubin

On Tue, Oct 20, 2009 at 4:59 PM, Jeff Moyer <[email protected]> wrote:
> Zubin Dittia <[email protected]> writes:
>
>> But this does bring up the interesting question of how to know which
>> of the I/Os I submitted had an error, and what the error was. ?Does it
>
> If io_submit returned N, then the N+1 iocb had an error.
>
>> PS: It does seem a little strange that io_submit returns an error if
>> the first IOCB is invalid but not when any of the other IOCBs are
>> invalid (this appears to be the case, at least according to the man
>> page).
>
> It makes sense that it tells you how many it could successfully submit.
> If it can't submit any, you get the error from the first I/O.
>
> Cheers,
> Jeff
>