The following documents my understanding of the differences between
the epoll and aio event frameworks. These differences stem from the
fact that epoll is designed for use by single threaded callers,
whereas aio is designed for use by multithreaded, thread pool callers.
I do not intend to criticise either design choice--each model (single
threaded vs. thread pool) has its uses and each model has requirements
of the event framework which conflict with the requirements of the
other.
The single threaded model has the advantage of more efficient use of a
single CPU and its associated cache. A single threaded caller also
tends to have fewer locking issues to deal with. As a result,
correctly written single threaded code tends to have a higher
throughput per CPU than thread pool code.
The thread pool model permits the application developer to write
blocking code. Asynchronous code takes more time to write and debug,
especially when one is starting from an existing code base with
blocking code and when one needs to use third party libraries with
blocking APIs. The thread pool model permits one to code
asynchronously only that 5% of the code where the program waits over
95% of the time, leaving the worker threads to deal with the rest.
The reduction in throughput one pays over the single threaded model is
effectively insurance against having the entire server stalled by an
overlooked blocking call or a page fault.
The biggest difference between the two frameworks is in the
cancellation semantics. epoll gives single threaded callers a
guarantee that after a legitimate cancel (EPOLL_CTL_DEL) operation
returns to the caller there is no possibility of an in-progress event
for the canceled request being delivered through a subsequent call to
epoll_wait(). This meets a desire for single threaded callers to
not have to deal with cancel/complete races and permits them to
immediately free their application-side per-connection state.
Thread pool applications, on the other hand, have to deal with
cancel/complete races anyway. Some other thread could have read the
event immediately before the cancel call. For this reason, aio cancel
does not bother removing pending completions from the event ring.
When aio_cancel() is called on an operation that has already delivered
its completion event, has a completion event in the ring, or is not
cancelable, it returns -EAGAIN. A thread pool application can deal
with this easily by waiting on a condition variable which is signaled
by the thread that picked up the event. A single threaded application
cannot block, so has to handle this condition by writing asynchronous
tear-down code.
Note that the fact that aio supports uncancelable operations (such as
every aio operation currently implemented in the base kernel) means
that single threaded callers which use such operations would need to
write this asynchronous tear-down code anyway. Should aio later add
any cancelable operations that wouldn't be available through epoll, it
may want to add for single threaded callers a variant of io_cancel()
that removes any associated event from the ring.
Another difference between the two frameworks is in the prevention (or
lack thereof) of multiple simultaneous events for an operation/fd.
epoll effectively assumes that its caller will finish processing a
returned event before making a subsequent call to epoll_wait(). If
multiple threads each call epoll_wait() on the same eventpoll fd, the
application can easily end up with multiple threads simutaneously
handling identical events. Worse, the application cannot tell how
many of these duplicate events are outstanding, so tear-down becomes
nigh impossible. epoll_wait() was designed to only be called from a
single thread and it uses this design aspect to optimize for its usual
case of a single submission/add generating multiple events.
aio keeps a one event per submission rule to avoid such problems for
thread pool callers. The tradeoff is that applications that want
subsequent events have to keep repaying the cost of submission.
Should the cost of submission turn out to be significant (I'm not
convinced it is with respect to the cost of handling the event) some
of this could be amortized by extending the aio framework with a
method for a thread which has obtained an intermediate event to
"re-arm" the operation once the thread has finished processing it.
On Mon, 19 May 2003, John Myers wrote:
> The following documents my understanding of the differences between
> the epoll and aio event frameworks. These differences stem from the
> fact that epoll is designed for use by single threaded callers,
> whereas aio is designed for use by multithreaded, thread pool callers.
>
> I do not intend to criticise either design choice--each model (single
> threaded vs. thread pool) has its uses and each model has requirements
> of the event framework which conflict with the requirements of the
> other.
Hi John, you seem to have lost a few episodes of the epoll saga. You can
use epoll in both Edge Triggered or Level Triggered ways, and in LT mode
epoll is basically a super-poll. You can call it with blocking and non
blocking fds. You can call it from many threads and (with LT mode) you
don't even need to reach EAGAIN (actually even with ET you don't need to
reach EAGAIN but I'm not willing in starting again discussions already
happened 25 times on lkml). You can easily do thread pooling also. As
a matter of fact a pretty famous on line gaming company is using epoll
together with a thread pooling implementation and last time I've got
contacted by them they were easily handling more than 150K fds with that
model. John, do not cast API in only work in a single environment. Is
poll/select a single threading API ? A thread pooling one ? I'd say both,
since you can choose the model it better fits your need. Same thing for
epoll. About the single shot feature I has a discussion here with the
guy that wrote kqueue and I was telling him about my wish to keep epoll as
simple as possible since people worked with poll/select for many years and
they did not commit suicide because of the lack of extended features.
Adding a single shot feature to epoll takes about 5 lines of code,
comments included :) You know how many reuqests I had ? Zero, nada.
- Davide
Davide Libenzi wrote:
> Adding a single shot feature to epoll takes about 5 lines of code,
> comments included :) You know how many reuqests I had ? Zero, nada.
I thought edge triggered epoll *was* single-shot.
- Dan
--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045
On Mon, 19 May 2003, Dan Kegel wrote:
> Davide Libenzi wrote:
> > Adding a single shot feature to epoll takes about 5 lines of code,
> > comments included :) You know how many reuqests I had ? Zero, nada.
>
> I thought edge triggered epoll *was* single-shot.
For single shot I mean that once you receive one event, you will not
receive more events for that fd if you do not rearm it. Suppose you
receive 1000 bytes of data and you get an event (EPOLLIN). If after 10
seconds you receive another 1000 bytes, you will receive another event.
This is not single shot.
- Davide
Davide Libenzi wrote:
>> Adding a single shot feature to epoll takes about 5 lines of code,
>> comments included :) You know how many reuqests I had ? Zero, nada.
On Mon, May 19, 2003 at 06:10:21PM -0700, Dan Kegel wrote:
> I thought edge triggered epoll *was* single-shot.
> - Dan
fs/eventpoll.c suggests "epoll" stands for "eventpoll" as opposed to
"edge-triggered". Davide, did the LT additions prompt the renaming or
was this always the case?
-- wli
On Mon, 19 May 2003, William Lee Irwin III wrote:
> Davide Libenzi wrote:
> >> Adding a single shot feature to epoll takes about 5 lines of code,
> >> comments included :) You know how many reuqests I had ? Zero, nada.
>
> On Mon, May 19, 2003 at 06:10:21PM -0700, Dan Kegel wrote:
> > I thought edge triggered epoll *was* single-shot.
> > - Dan
>
> fs/eventpoll.c suggests "epoll" stands for "eventpoll" as opposed to
> "edge-triggered". Davide, did the LT additions prompt the renaming or
> was this always the case?
It was both actually :) It meant event-poll and also was edge-triggered.
Now you can have it level-triggered on a per-fd basis. The epoll named was
not a good one from the beginning though :)
- Davide
On Mon, 19 May 2003, Dan Kegel wrote:
> > For single shot I mean that once you receive one event, you will not
> > receive more events for that fd if you do not rearm it. Suppose you
> > receive 1000 bytes of data and you get an event (EPOLLIN). If after 10
> > seconds you receive another 1000 bytes, you will receive another event.
> > This is not single shot.
>
> Oh, ok. I much prefer plain old edge triggered, anyway. It does
> the right thing with less fuss.
If someone will show a practical case where you cannot live without,
implementing it is trivial.
- Davide
Davide Libenzi wrote:
> On Mon, 19 May 2003, Dan Kegel wrote:
>
>
>>Davide Libenzi wrote:
>>
>>>Adding a single shot feature to epoll takes about 5 lines of code,
>>>comments included :) You know how many reuqests I had ? Zero, nada.
>>
>>I thought edge triggered epoll *was* single-shot.
>
>
> For single shot I mean that once you receive one event, you will not
> receive more events for that fd if you do not rearm it. Suppose you
> receive 1000 bytes of data and you get an event (EPOLLIN). If after 10
> seconds you receive another 1000 bytes, you will receive another event.
> This is not single shot.
Oh, ok. I much prefer plain old edge triggered, anyway. It does
the right thing with less fuss.
- Dan
--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045
Davide Libenzi wrote:
>>> Adding a single shot feature to epoll takes about 5 lines of code,
>>> comments included :) You know how many reuqests I had ? Zero, nada.
On Mon, 19 May 2003, Dan Kegel wrote:
>> I thought edge triggered epoll *was* single-shot.
On Mon, May 19, 2003 at 05:47:15PM -0700, Davide Libenzi wrote:
> For single shot I mean that once you receive one event, you will not
> receive more events for that fd if you do not rearm it. Suppose you
> receive 1000 bytes of data and you get an event (EPOLLIN). If after 10
> seconds you receive another 1000 bytes, you will receive another event.
> This is not single shot.
I think this would be useful for network daemons that would like to
fairly schedule responses (i.e. not re-arm until a client on a given fd
deserves a turn again). IRC daemons would appear to be a perfect
candidate for such. OTOH you may want to wait until someone is writing
such a beast so "it will be used" instead of "it is potentially useful".
-- wli
William Lee Irwin III wrote:
> Davide Libenzi wrote:
>
>>>>Adding a single shot feature to epoll takes about 5 lines of code,
>>>>comments included :) You know how many reuqests I had ? Zero, nada.
>
>
> On Mon, 19 May 2003, Dan Kegel wrote:
>
>>>I thought edge triggered epoll *was* single-shot.
>
>
> On Mon, May 19, 2003 at 05:47:15PM -0700, Davide Libenzi wrote:
>
>>For single shot I mean that once you receive one event, you will not
>>receive more events for that fd if you do not rearm it. Suppose you
>>receive 1000 bytes of data and you get an event (EPOLLIN). If after 10
>>seconds you receive another 1000 bytes, you will receive another event.
>>This is not single shot.
>
>
> I think this would be useful for network daemons that would like to
> fairly schedule responses (i.e. not re-arm until a client on a given fd
> deserves a turn again). IRC daemons would appear to be a perfect
> candidate for such. ...
No need. The plain old edge triggered behavior can handle this
nicely.
- Dan
--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045
William Lee Irwin III wrote:
>> I think this would be useful for network daemons that would like to
>> fairly schedule responses (i.e. not re-arm until a client on a given fd
>> deserves a turn again). IRC daemons would appear to be a perfect
>> candidate for such. ...
On Mon, May 19, 2003 at 06:37:49PM -0700, Dan Kegel wrote:
> No need. The plain old edge triggered behavior can handle this
> nicely.
AIUI after the iospace on an fd is exhausted the event will be re-armed.
It could probably be taken and then ignored until the client deserves a
response again. Is that what you had in mind?
(Don't take this too far; I'm in hypothetical land and am not pushing for
the feature hard if at all.)
-- wli
William Lee Irwin III wrote:
> William Lee Irwin III wrote:
>
>>>I think this would be useful for network daemons that would like to
>>>fairly schedule responses (i.e. not re-arm until a client on a given fd
>>>deserves a turn again). IRC daemons would appear to be a perfect
>>>candidate for such. ...
>
>
> On Mon, May 19, 2003 at 06:37:49PM -0700, Dan Kegel wrote:
>
>>No need. The plain old edge triggered behavior can handle this
>>nicely.
>
>
> AIUI after the iospace on an fd is exhausted the event will be re-armed.
> It could probably be taken and then ignored until the client deserves a
> response again. Is that what you had in mind?
In edge-triggered mode, epoll will deliver an event only when events warrant it (sic).
If you decide to starve a client for a while, that client's fd
will only get an event or two as the last bits of I/O to it
occur; after that, no more events will come in unless you do
some I/O.
So I guess I'm saying "remember the fact that you got the event, but
don't do anything about it until you feel like it".
- Dan
--
Dan Kegel
http://www.kegel.com
http://counter.li.org/cgi-bin/runscript/display-person.cgi?user=78045