2006-08-11 19:48:18

by Ulrich Drepper

[permalink] [raw]
Subject: Re: [3/4] kevent: AIO, aio_sendfile() implementation.

Sébastien Dugué wrote:
> aio completion notification

I looked over this now but I don't think I understand everything. Or I
don't see how it all is integrated. And no, I'm not looking at the
proposed glibc code since would mean being tainted.


> Details:
> -------
>
> A struct sigevent *aio_sigeventp is added to struct iocb in
> include/linux/aio_abi.h
>
> An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
> include/linux/aio.h:
>
> - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
> requesting thread
>
> - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
> specifi thread.

This has been proved to be sufficient in the timer code which basically
has the same problem. But why do you need separate constants? We have
the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use
these constants for the values of ki_notify.


> The following fields are added to struct kiocb in include/linux/aio.h:
>
> - pid_t ki_pid: target of the signal
>
> - __u16 ki_signo: signal number
>
> - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
> IO_NOTIFY_THREAD_ID
>
> - uid_t ki_uid, ki_euid: filled with the submitter credentials

These two fields aren't needed for the POSIX interfaces. Where does the
requirement come from? I don't say they should be removed, they might
be useful, but if the costs are non-negligible then they could go away.


> - check whether the submitting thread wants to be notified directly
> (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent
> to another thread.
> In the latter case a check is made to assert that the target thread
> is in the same thread group

Is this really how it's implemented? This is not how it should be.
Either a signal is sent to a specific thread in the same process (this
is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
process. Sending a signal to the process means that from the kernel's
POV any thread which doesn't have the signal blocked can receive it.
The final decision is made by the kernel. There is no mechanism to send
the signal to another process.

So, for the purpose of the POSIX AIO code the ki_pid value is only
needed when the SIGEV_THREAD_ID bit is set.

It could be an extension and I don't mind it being introduced. But
again, it's not necessary and if it adds costs then it could be left
out. It is something which could easily be introduced later if the need
arises.


> listio support
>

I really don't understand the kernel interface for this feature.


> Details:
> -------
>
> An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
>
> A struct lio_event is added in include/linux/aio.h
>
> A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h

So you have a pointer in the structure for the individual requests. I
assume you use the atomic counter to trigger the final delivery. I
further assume that if lio_wait is set the calling thread is suspended
until all requests are handled and that the final notification in this
case means that thread gets woken.

This is all fine.

But how do you pass the requests to the kernel? If you have a new
lio_listio-like syscall it'll be easy. But I haven't seen anything like
this mentioned.

The alternative is to pass the requests one-by-one in which case I don't
see how you create the reference to the lio_listio control block. This
approach seems to be slower.

If all requests are passed at once, do you have the equivalent of
LIO_NOP entries?


How can we support the extension where we wait for a number of requests
which need not be all of them. I.e., I submit N requests and want to be
notified when at least M (M <= N) notified. I am not yet clear about
the actual semantics we should implement (e.g., do we send another
notification after the first one?) but it's something which IMO should
be taken into account in the design.


Finally, and this is very important, does you code send out the
individual requests notification and then in the end the lio_listio
completion? I think Suparna wrote this is the case but I want to make sure.


Overall, this looks much better than the old code. If the answers to my
questions show that the behavior is compatible with the POSIX AIO code
I'm certainly very much in favor of adding the kernel code.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2006-08-12 18:29:03

by Suparna Bhattacharya

[permalink] [raw]
Subject: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)


BTW, if anyone would like to be dropped off this growing cc list, please
let us know.

On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote:
> Sébastien Dugué wrote:
> > aio completion notification
>
> I looked over this now but I don't think I understand everything. Or I
> don't see how it all is integrated. And no, I'm not looking at the
> proposed glibc code since would mean being tainted.

Oh, I didn't realise that.
I'll make an attempt to clarify parts that I understand based on what I
have gleaned from my reading of the code and intent, but hopefully Sebastien,
Ben, Zach et al will be able to pitch in for a more accurate and complete
picture.

>
>
> > Details:
> > -------
> >
> > A struct sigevent *aio_sigeventp is added to struct iocb in
> > include/linux/aio_abi.h
> >
> > An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
> > include/linux/aio.h:
> >
> > - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
> > requesting thread
> >
> > - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
> > specifi thread.
>
> This has been proved to be sufficient in the timer code which basically
> has the same problem. But why do you need separate constants? We have
> the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use
> these constants for the values of ki_notify.
>

I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
part of the ABI, but only internal to the kernel implementation. I think
Zach had suggested inferring THREAD_ID notification if the pid specified
is not zero. But, I don't see why ->sigev_notify couldn't used directly
(just like the POSIX timers code does) thus doing away with the
new constants altogether. Sebestian/Laurent, do you recall?

>
> > The following fields are added to struct kiocb in include/linux/aio.h:
> >
> > - pid_t ki_pid: target of the signal
> >
> > - __u16 ki_signo: signal number
> >
> > - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
> > IO_NOTIFY_THREAD_ID
> >
> > - uid_t ki_uid, ki_euid: filled with the submitter credentials
>
> These two fields aren't needed for the POSIX interfaces. Where does the
> requirement come from? I don't say they should be removed, they might
> be useful, but if the costs are non-negligible then they could go away.

I'm guessing they are being used for validation of permissions at the time
of sending the signal, but maybe saving the task pointer in the iocb instead
of the pid would suffice ?

>
>
> > - check whether the submitting thread wants to be notified directly
> > (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent
> > to another thread.
> > In the latter case a check is made to assert that the target thread
> > is in the same thread group
>
> Is this really how it's implemented? This is not how it should be.
> Either a signal is sent to a specific thread in the same process (this
> is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
> process. Sending a signal to the process means that from the kernel's
> POV any thread which doesn't have the signal blocked can receive it.
> The final decision is made by the kernel. There is no mechanism to send
> the signal to another process.

The code seems to be set up to call specific_send_sig_info() in the case
of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the
intended behaviour is as you describe it should be (__group_send_sig_info
does the equivalent of sending a signal to the process and so any thread
which doesn't have signals blocked can receive it, while specific_send_sig_info
sends it to a particular thread).

But, I should really leave it to Sebestian to confirm that.

>
> So, for the purpose of the POSIX AIO code the ki_pid value is only
> needed when the SIGEV_THREAD_ID bit is set.
>
> It could be an extension and I don't mind it being introduced. But
> again, it's not necessary and if it adds costs then it could be left
> out. It is something which could easily be introduced later if the need
> arises.
>
>
> > listio support
> >
>
> I really don't understand the kernel interface for this feature.

I'm sorry this is confusing. This probably means that we need to
separate the external interface description more clearly and completely
from the internals.

>
>
> > Details:
> > -------
> >
> > An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
> >
> > A struct lio_event is added in include/linux/aio.h
> >
> > A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h
>
> So you have a pointer in the structure for the individual requests. I
> assume you use the atomic counter to trigger the final delivery. I
> further assume that if lio_wait is set the calling thread is suspended
> until all requests are handled and that the final notification in this
> case means that thread gets woken.
>
> This is all fine.
>
> But how do you pass the requests to the kernel? If you have a new
> lio_listio-like syscall it'll be easy. But I haven't seen anything like
> this mentioned.
>
> The alternative is to pass the requests one-by-one in which case I don't
> see how you create the reference to the lio_listio control block. This
> approach seems to be slower.

The way it works (and better ideas are welcome) is that, since the io_submit()
syscall already accepts an array of iocbs[], no new syscall was introduced.
To implement lio_listio, one has to set up such an array, with the first iocb
in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
specifies the sigev notification to be associated with group completion
(a NULL value of the sigev notification pointer would imply equivalent of
LIO_WAIT). The following iocbs in the array should correspond to the set of
listio aiocbs. Whenever it encounters an IOCB_CMD_GROUP iocb opcode, the
kernel would interpret all subsequent iocbs[] submitted in the same
io_submit() call to be associated with the same lio control block.

Does that clarify ?

Would an example help ?

>
> If all requests are passed at once, do you have the equivalent of
> LIO_NOP entries?
>

Good question - we do have an IOCB_CMD_NOOP defined, and I seem to even
recall a patch that implemented it, but am wondering if it ever got merged.
Ben/Zach ?

>
> How can we support the extension where we wait for a number of requests
> which need not be all of them. I.e., I submit N requests and want to be
> notified when at least M (M <= N) notified. I am not yet clear about
> the actual semantics we should implement (e.g., do we send another
> notification after the first one?) but it's something which IMO should
> be taken into account in the design.
>

My thought here was that it should be possible to include M as a parameter
to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
block ... then whatever semantics are agreed upon can be implemented.

>
> Finally, and this is very important, does you code send out the
> individual requests notification and then in the end the lio_listio
> completion? I think Suparna wrote this is the case but I want to make sure.

Sebestian, could you confirm ?

>
>
> Overall, this looks much better than the old code. If the answers to my
> questions show that the behavior is compatible with the POSIX AIO code
> I'm certainly very much in favor of adding the kernel code.

Thanks a lot for looking through this !
Let us know what you think about the listio interface ... hopefully the
other issues are mostly simple to resolve.

Regards
Suparna

>
> --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
>



--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2006-08-12 19:11:26

by Ulrich Drepper

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

Suparna Bhattacharya wrote:
> I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> part of the ABI, but only internal to the kernel implementation. I think
> Zach had suggested inferring THREAD_ID notification if the pid specified
> is not zero. But, I don't see why ->sigev_notify couldn't used directly
> (just like the POSIX timers code does) thus doing away with the
> new constants altogether. Sebestian/Laurent, do you recall?

I suggest to model the implementation after the timer code which does
exactly what we need.


> I'm guessing they are being used for validation of permissions at the time
> of sending the signal, but maybe saving the task pointer in the iocb instead
> of the pid would suffice ?

Why should any verification be necessary? The requests are generated in
the same process which will receive the notification. Even if the POSIX
process (aka, kernel process group) changes the IDs the notifications
should be set. The key is that notifications cannot be sent to another
POSIX process.

Adding this as a feature just makes things so much more complicated.


> So I think the
> intended behaviour is as you describe it should be

Then the documentation needs to be adjusted.


> The way it works (and better ideas are welcome) is that, since the io_submit()
> syscall already accepts an array of iocbs[], no new syscall was introduced.
> To implement lio_listio, one has to set up such an array, with the first iocb
> in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
> specifies the sigev notification to be associated with group completion
> (a NULL value of the sigev notification pointer would imply equivalent of
> LIO_WAIT).

OK, this seems OK. We have to construct the iocb arrays dynamically anyway.


> My thought here was that it should be possible to include M as a parameter
> to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> block ... then whatever semantics are agreed upon can be implemented.

If you have room for the parameter this is fine. For the beginning we
can enforce the number to be the same as the total number of requests.


> Let us know what you think about the listio interface ... hopefully the
> other issues are mostly simple to resolve.

It should be fine and I would support adding all this assuming the
normal file support (as opposed to direct I/O only) is added, too.


But I have one last question: sockets, pipes and the like are already
supported, right? If this is not the case we have a problem with the
currently proposed lio_listio interface.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2006-08-12 19:28:26

by Jakub Jelinek

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the
> > new constants altogether. Sebestian/Laurent, do you recall?
>
> I suggest to model the implementation after the timer code which does
> exactly what we need.

Yeah, and if at all possible we want to use just one helper thread for
SIGEV_THREAD notification of timers/aio/etc., so it really should behave the
same as timer thread notification.

Jakub

2006-08-14 07:01:31

by Suparna Bhattacharya

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the
> > new constants altogether. Sebestian/Laurent, do you recall?
>
> I suggest to model the implementation after the timer code which does
> exactly what we need.

Agreed.

>
>
> > I'm guessing they are being used for validation of permissions at the time
> > of sending the signal, but maybe saving the task pointer in the iocb instead
> > of the pid would suffice ?
>
> Why should any verification be necessary? The requests are generated in
> the same process which will receive the notification. Even if the POSIX
> process (aka, kernel process group) changes the IDs the notifications
> should be set. The key is that notifications cannot be sent to another
> POSIX process.

Is there a (remote) possibility that the thread could have died and its
pid got reused by a new thread in another process ? Or is there a mechanism
that prevents such a possibility from arising (not just in NPTL library,
but at the kernel level) ?

I think the timer code saves a reference to the task pointer instead of
the pid, which is what I was suggesting above (instead of the euid checks),
as way to avoid the above situation.

>
> Adding this as a feature just makes things so much more complicated.
>
>
> > So I think the
> > intended behaviour is as you describe it should be
>
> Then the documentation needs to be adjusted.

*Nod*

>
>
> > The way it works (and better ideas are welcome) is that, since the io_submit()
> > syscall already accepts an array of iocbs[], no new syscall was introduced.
> > To implement lio_listio, one has to set up such an array, with the first iocb
> > in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
> > specifies the sigev notification to be associated with group completion
> > (a NULL value of the sigev notification pointer would imply equivalent of
> > LIO_WAIT).
>
> OK, this seems OK. We have to construct the iocb arrays dynamically anyway.
>
>
> > My thought here was that it should be possible to include M as a parameter
> > to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> > block ... then whatever semantics are agreed upon can be implemented.
>
> If you have room for the parameter this is fine. For the beginning we
> can enforce the number to be the same as the total number of requests.
>

Sounds good.

>
> > Let us know what you think about the listio interface ... hopefully the
> > other issues are mostly simple to resolve.
>
> It should be fine and I would support adding all this assuming the
> normal file support (as opposed to direct I/O only) is added, too.

OK. I updated my patchset against 2618-rc3 just after OLS.

>
>
> But I have one last question: sockets, pipes and the like are already
> supported, right? If this is not the case we have a problem with the
> currently proposed lio_listio interface.

AIO for pipes should not be a problem - Chris Mason had a patch, so we can
just bring it up to the current levels, possibly with some additional
improvements.

I'm not sure what would be the right thing to do for the sockets case. While
we could put together a patch for basic aio_read/write (based on the same
model used for files), given the whole ongoing kevent effort, its not yet
clear to me what would make the most sense ...

Ben had a patch to do a fallback to kernel threads for AIO operations that
are not yet supported natively. I had some concerns about the approach, but
I guess he had intended it as an interim path for cases like this.

Suggestions would be much appreciated ? DaveM, Ingo, Andrew ?

Regards
Suparna

>
> --
> ➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
>



--
Suparna Bhattacharya ([email protected])
Linux Technology Center
IBM Software Lab, India

2006-08-14 16:39:58

by Ulrich Drepper

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

Suparna Bhattacharya wrote:
> Is there a (remote) possibility that the thread could have died and its
> pid got reused by a new thread in another process ? Or is there a mechanism
> that prevents such a possibility from arising (not just in NPTL library,
> but at the kernel level) ?

The UID/GID won't help you with dying processes. What if the same user
creates a process with the same PID? That process will not expect the
notification and mustn't receive it. If you cannot detect whether the
issuing process died you have problems which cannot be solved with a
uid/gid pair.


> AIO for pipes should not be a problem - Chris Mason had a patch, so we can
> just bring it up to the current levels, possibly with some additional
> improvements.

Good.


> I'm not sure what would be the right thing to do for the sockets case. While
> we could put together a patch for basic aio_read/write (based on the same
> model used for files), given the whole ongoing kevent effort, its not yet
> clear to me what would make the most sense ...
>
> Ben had a patch to do a fallback to kernel threads for AIO operations that
> are not yet supported natively. I had some concerns about the approach, but
> I guess he had intended it as an interim path for cases like this.

A fallback solution would be sufficient. Nobody _should_ use POSIX AIO
for networking but people do and just giving them something that works
is good enough. It cannot really be worse than the userlevel emulation
we have know.

The alternative, separately and sequentially handling network sockets at
userlevel is horrible. We'd have to go over every file descriptor and
check whether it's a socket and then take if out of the request list for
the kernel. Then they need to be handled separately before or after the
kernel AIO code. This would punish unduly all the 99.9% of the programs
which don't use POSIX AIO for network I/O.

--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖


Attachments:
signature.asc (251.00 B)
OpenPGP digital signature

2006-08-15 02:07:33

by Nicholas Miell

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

On Mon, 2006-08-14 at 09:38 -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > Is there a (remote) possibility that the thread could have died and its
> > pid got reused by a new thread in another process ? Or is there a mechanism
> > that prevents such a possibility from arising (not just in NPTL library,
> > but at the kernel level) ?
>
> The UID/GID won't help you with dying processes. What if the same user
> creates a process with the same PID? That process will not expect the
> notification and mustn't receive it. If you cannot detect whether the
> issuing process died you have problems which cannot be solved with a
> uid/gid pair.
>
>

Eric W. Biederman sent a series of patches that introduced a struct
task_ref specifically to solve this sort of problem on January 28 of
this year, but I don't think it went anywhere.


--
Nicholas Miell <[email protected]>

2006-09-04 14:29:15

by Sébastien Dugué

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

Hi,

just came back from vacation, sorry for the delay.

On Sat, 2006-08-12 at 23:59 +0530, Suparna Bhattacharya wrote:
> BTW, if anyone would like to be dropped off this growing cc list, please
> let us know.
>
> On Fri, Aug 11, 2006 at 12:45:55PM -0700, Ulrich Drepper wrote:
> > S?bastien Dugu? wrote:
> > > aio completion notification
> >
> > I looked over this now but I don't think I understand everything. Or I
> > don't see how it all is integrated. And no, I'm not looking at the
> > proposed glibc code since would mean being tainted.
>
> Oh, I didn't realise that.
> I'll make an attempt to clarify parts that I understand based on what I
> have gleaned from my reading of the code and intent, but hopefully Sebastien,
> Ben, Zach et al will be able to pitch in for a more accurate and complete
> picture.
>
> >
> >
> > > Details:
> > > -------
> > >
> > > A struct sigevent *aio_sigeventp is added to struct iocb in
> > > include/linux/aio_abi.h
> > >
> > > An enum {IO_NOTIFY_SIGNAL = 0, IO_NOTIFY_THREAD_ID = 1} is added in
> > > include/linux/aio.h:
> > >
> > > - IO_NOTIFY_SIGNAL means that the signal is to be sent to the
> > > requesting thread
> > >
> > > - IO_NOTIFY_THREAD_ID means that the signal is to be sent to a
> > > specifi thread.
> >
> > This has been proved to be sufficient in the timer code which basically
> > has the same problem. But why do you need separate constants? We have
> > the various SIGEV_* constants, among them SIGEV_THREAD_ID. Just use
> > these constants for the values of ki_notify.
> >
>
> I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> part of the ABI, but only internal to the kernel implementation. I think
> Zach had suggested inferring THREAD_ID notification if the pid specified
> is not zero. But, I don't see why ->sigev_notify couldn't used directly
> (just like the POSIX timers code does) thus doing away with the
> new constants altogether. Sebestian/Laurent, do you recall?

As I see it, those IO_NOTIFY_* constants are uneeded and we could use
->sigev_notify directly. I will change this so that we use the same
mechanism as the POSIX timers code.

>
> >
> > > The following fields are added to struct kiocb in include/linux/aio.h:
> > >
> > > - pid_t ki_pid: target of the signal
> > >
> > > - __u16 ki_signo: signal number
> > >
> > > - __u16 ki_notify: kind of notification, IO_NOTIFY_SIGNAL or
> > > IO_NOTIFY_THREAD_ID
> > >
> > > - uid_t ki_uid, ki_euid: filled with the submitter credentials
> >
> > These two fields aren't needed for the POSIX interfaces. Where does the
> > requirement come from? I don't say they should be removed, they might
> > be useful, but if the costs are non-negligible then they could go away.
>
> I'm guessing they are being used for validation of permissions at the time
> of sending the signal, but maybe saving the task pointer in the iocb instead
> of the pid would suffice ?

IIRC, Ben added these for that exact reason. Is this really needed?
Ben?

>
> >
> >
> > > - check whether the submitting thread wants to be notified directly
> > > (sigevent->sigev_notify_thread_id is 0) or wants the signal to be sent
> > > to another thread.
> > > In the latter case a check is made to assert that the target thread
> > > is in the same thread group
> >
> > Is this really how it's implemented? This is not how it should be.
> > Either a signal is sent to a specific thread in the same process (this
> > is what SIGEV_THREAD_ID is for) or the signal is sent to a calling
> > process. Sending a signal to the process means that from the kernel's
> > POV any thread which doesn't have the signal blocked can receive it.
> > The final decision is made by the kernel. There is no mechanism to send
> > the signal to another process.
>
> The code seems to be set up to call specific_send_sig_info() in the case
> of *_THREAD_ID , and __group_send_sig_info() otherwise. So I think the
> intended behaviour is as you describe it should be (__group_send_sig_info
> does the equivalent of sending a signal to the process and so any thread
> which doesn't have signals blocked can receive it, while specific_send_sig_info
> sends it to a particular thread).
>
> But, I should really leave it to Sebestian to confirm that.

That's right, but I think that part needs to be reworked to follow
the same logic as the POSIX timers.


> > > listio support
> > >
> >
> > I really don't understand the kernel interface for this feature.
>
> I'm sorry this is confusing. This probably means that we need to
> separate the external interface description more clearly and completely
> from the internals.
>
> >
> >
> > > Details:
> > > -------
> > >
> > > An IOCB_CMD_GROUP is added to the IOCB_CMD enum in include/linux/aio_abi.h
> > >
> > > A struct lio_event is added in include/linux/aio.h
> > >
> > > A struct lio_event *ki_lio is added to struct iocb in include/linux/aio.h
> >
> > So you have a pointer in the structure for the individual requests. I
> > assume you use the atomic counter to trigger the final delivery. I
> > further assume that if lio_wait is set the calling thread is suspended
> > until all requests are handled and that the final notification in this
> > case means that thread gets woken.
> >
> > This is all fine.
> >
> > But how do you pass the requests to the kernel? If you have a new
> > lio_listio-like syscall it'll be easy. But I haven't seen anything like
> > this mentioned.
> >
> > The alternative is to pass the requests one-by-one in which case I don't
> > see how you create the reference to the lio_listio control block. This
> > approach seems to be slower.
>
> The way it works (and better ideas are welcome) is that, since the io_submit()
> syscall already accepts an array of iocbs[], no new syscall was introduced.
> To implement lio_listio, one has to set up such an array, with the first iocb
> in the array having the special (new) grouping opcode of IOCB_CMD_GROUP which
> specifies the sigev notification to be associated with group completion
> (a NULL value of the sigev notification pointer would imply equivalent of
> LIO_WAIT). The following iocbs in the array should correspond to the set of
> listio aiocbs. Whenever it encounters an IOCB_CMD_GROUP iocb opcode, the
> kernel would interpret all subsequent iocbs[] submitted in the same
> io_submit() call to be associated with the same lio control block.
>
> Does that clarify ?
>
> Would an example help ?
>
> >
> > If all requests are passed at once, do you have the equivalent of
> > LIO_NOP entries?

So far, LIO_NOP entries are pruned by the support library
(libposix-aio) and never sent to the kernel.
> >
>
> Good question - we do have an IOCB_CMD_NOOP defined, and I seem to even
> recall a patch that implemented it, but am wondering if it ever got merged.
> Ben/Zach ?
>
> >
> > How can we support the extension where we wait for a number of requests
> > which need not be all of them. I.e., I submit N requests and want to be
> > notified when at least M (M <= N) notified. I am not yet clear about
> > the actual semantics we should implement (e.g., do we send another
> > notification after the first one?) but it's something which IMO should
> > be taken into account in the design.
> >
>
> My thought here was that it should be possible to include M as a parameter
> to the IOCB_CMD_GROUP opcode iocb, and thus incorporated in the lio control
> block ... then whatever semantics are agreed upon can be implemented.
>
> >
> > Finally, and this is very important, does you code send out the
> > individual requests notification and then in the end the lio_listio
> > completion? I think Suparna wrote this is the case but I want to make sure.
>
> Sebestian, could you confirm ?

If (and only if) the user did setup a sigevent for one or more
individual requests then those requests completion will trigger a
notification and in the end the list completion notification is sent.
Otherwise, only the list completion notification is sent.


--
-----------------------------------------------------

S?bastien Dugu? BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770

mailto:[email protected]

Linux POSIX AIO: http://www.bullopensource.org/posix
http://sourceforge.net/projects/paiol

-----------------------------------------------------

2006-09-04 14:36:50

by Sébastien Dugué

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

On Sat, 2006-08-12 at 12:10 -0700, Ulrich Drepper wrote:
> Suparna Bhattacharya wrote:
> > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > part of the ABI, but only internal to the kernel implementation. I think
> > Zach had suggested inferring THREAD_ID notification if the pid specified
> > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > (just like the POSIX timers code does) thus doing away with the
> > new constants altogether. Sebestian/Laurent, do you recall?
>
> I suggest to model the implementation after the timer code which does
> exactly what we need.
>

Will do.

>
> > I'm guessing they are being used for validation of permissions at the time
> > of sending the signal, but maybe saving the task pointer in the iocb instead
> > of the pid would suffice ?
>
> Why should any verification be necessary? The requests are generated in
> the same process which will receive the notification. Even if the POSIX
> process (aka, kernel process group) changes the IDs the notifications
> should be set. The key is that notifications cannot be sent to another
> POSIX process.
>
> Adding this as a feature just makes things so much more complicated.
>

Agreed.

S?bastien.


--
-----------------------------------------------------

S?bastien Dugu? BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770

mailto:[email protected]

Linux POSIX AIO: http://www.bullopensource.org/posix
http://sourceforge.net/projects/paiol

-----------------------------------------------------

2006-09-04 14:38:08

by Sébastien Dugué

[permalink] [raw]
Subject: Re: Kernel patches enabling better POSIX AIO (Was Re: [3/4] kevent: AIO, aio_sendfile)

On Sat, 2006-08-12 at 15:28 -0400, Jakub Jelinek wrote:
> On Sat, Aug 12, 2006 at 12:10:35PM -0700, Ulrich Drepper wrote:
> > > I am wondering about that too. IIRC, the IO_NOTIFY_* constants are not
> > > part of the ABI, but only internal to the kernel implementation. I think
> > > Zach had suggested inferring THREAD_ID notification if the pid specified
> > > is not zero. But, I don't see why ->sigev_notify couldn't used directly
> > > (just like the POSIX timers code does) thus doing away with the
> > > new constants altogether. Sebestian/Laurent, do you recall?
> >
> > I suggest to model the implementation after the timer code which does
> > exactly what we need.
>
> Yeah, and if at all possible we want to use just one helper thread for
> SIGEV_THREAD notification of timers/aio/etc., so it really should behave the
> same as timer thread notification.
>

That's exactly what is done in libposix-aio.

S?bastien.

--
-----------------------------------------------------

S?bastien Dugu? BULL/FREC:B1-247
phone: (+33) 476 29 77 70 Bullcom: 229-7770

mailto:[email protected]

Linux POSIX AIO: http://www.bullopensource.org/posix
http://sourceforge.net/projects/paiol

-----------------------------------------------------