2000-10-30 00:37:37

by Dan Kegel

[permalink] [raw]
Subject: Readiness vs. completion (was: Re: Linux's implementation of poll() not scalable?)

John Gardiner Myers <[email protected]> wrote:
> Your proposed interface suffers from most of the same problems as the
> other Unix event interfaces I've seen. Key among the problems are
> inherent race conditions when the interface is used by multithreaded
> applications.
>
> The "stickiness" of the event binding can cause race conditions where
> two threads simultaneously attempt to handle the same event. For
> example, consider a socket becomes writeable, delivering a writable
> event to one of the multiple threads calling get_events(). While the
> callback for that event is running, it writes to the socket, causing the
> socket to become non-writable and then writeable again. That in turn
> can cause another writable event to be delivered to some other thread.
> ...
> In the async I/O library I work with, this problem is addressed by
> having delivery of the event atomically remove the binding. If the
> event needs to be bound after the callback is finished, then it is the
> responsibility for the callback to rebind the event.

IMHO you're describing a situation where a 'completion notification event'
(as with aio) would be more appropriate than a 'readiness notification event'
(as with poll).

With completion notification, one naturally expects 'edge triggered',
'one shot' behavior from the notification system, with no event coalescing,
and there is no need to remove or reestablish bindings.

> There are three performance issues that need to be addressed by the
> implementation of get_events(). One is that events preferably be given
> to threads that are the same CPU as bound the event. That allows the
> event's context to remain in the CPU's cache.
>
> Two is that of threads on a given CPU, events should wake threads in
> LIFO order. This allows excess worker threads to be paged out.
>
> Three is that the event queue should limit the number of worker threads
> contending with each other for CPU. If an event is triggered while
> there are enough worker threads in runnable state, it is better to wait
> for one of those threads to block before delivering the event.

That describes NT's 'completion port / thread pooling' scheme, I think
(which incidentally is a 'completion notification' rather than a 'readiness
notification' - based scheme).

I suspect readiness notification using edge triggering is a
strange beast, not often seen in the wild, and hard to define precisely.

I'm going to risk generalizing, and categorizing the existing base
of application software into two groups. Would it be going to far to say
the following:

Readiness notification, like that provided by traditional poll(),
fits naturally with level-triggered events with event coalescing,
and a large body of traditional Unix software exists that uses this paradigm.

Completion notification, like that provided by aio and NT's networking,
fits naturally with edge-triggered events with no event coalescing,
and a large body of win32 software exists that uses this paradigm.

And, come to think of it, network programmers usually can be categorized
into the same two groups :-) Each style of programming is an acquired taste.

IMHO if Linux is to be maximally popular with software developers
(desirable if we want to boost the number of apps available for Linux),
it would help to cater to both flavors of network programming.

So I'd like to see both a high-performance level-triggered readiness
notification API with event coalescing, and a high-performance edge-triggered
completion API with no event coalescing. With luck, they'll be the
same API, but with slightly different flag values.

- Dan


2000-10-30 18:56:51

by John Myers

[permalink] [raw]
Subject: Re: Readiness vs. completion (was: Re: Linux's implementation of poll()not scalable?)



Dan Kegel wrote:
> IMHO you're describing a situation where a 'completion notification event'
> (as with aio) would be more appropriate than a 'readiness notification event'
> (as with poll).

I've found that I want both types of events, preferably through the same
interface. To provide a "completion notification event" interface on
top of an existing nonblocking interface, one needs an "async poll"
mechanism with edge-triggered events with no event coalescing.

You are correct in recognizing NT completion ports from my description.
While the NT completion port interface is ugly as sin, it gets a number
of performance issues right.

> And, come to think of it, network programmers usually can be categorized
> into the same two groups :-) Each style of programming is an acquired taste.

I would say that the "completion notification" style is a paradigm
beyond the "readiness notification" style. I started with the select()
model of network programming and have since learned the clear
superiority of the "completion notificatin" style.


Attachments:
smime.p7s (2.10 kB)
S/MIME Cryptographic Signature

2000-10-30 20:45:26

by John Myers

[permalink] [raw]
Subject: Re: Readiness vs. completion (was: Re: Linux's implementationofpoll()not scalable?)



Dan Kegel wrote:
> If you have a top-notch completion notification event interface
> provided natively by the OS, though, does that get rid of the
> need for the "async poll" mechanism?

A top-notch completion notification event interface needs to be able to
provide "async poll" functionality. There are some situations where an
application needs a completion notification event when an fd is readable
or writeable, but cannot supply buffers or data until after the event
arrives.

One of these situations is when the application is using a nonblocking
interface to an existing library. When the library returns a
"wouldblock" condition, the application determines through the interface
(or the interface definition) which poll events need to occur before a
subsequent call to the library is likely to result in progress. The
application then needs to schedule a completion event for when those
poll events occur. The application does not know enough about the
library implementation to schedule async I/O and the library is not
written to use async I/O itself.

Another situation occurs when handling a large number of mostly idle
connections. Consider a protocol for which a server receives one
command per half hour per connection. A server process would want to
handle hundreds of thousands to millions of such connections. If the
server were to use asynchronous read operations, then it would have to
allocate one input buffer per connection. Better to instead use
asynchronous read poll operations, allocating buffers to connections
only when those connections have pending input.

This latter situation would be further improved by a variant of the
asynchronous read operation where the buffer is supplied by either the
event queue object or the caller to get_event(), but that's a separate
issue.


Attachments:
smime.p7s (2.10 kB)
S/MIME Cryptographic Signature

2000-10-31 07:12:13

by Dan Kegel

[permalink] [raw]
Subject: Re: Readiness vs. completion (was: Re: Linux's implementation ofpoll()not scalable?)

John Gardiner Myers wrote:
>
> Dan Kegel wrote:
> > IMHO you're describing a situation where a 'completion notification event'
> > (as with aio) would be more appropriate than a 'readiness notification event'
> > (as with poll).
>
> I've found that I want both types of events, preferably through the same
> interface.

That's good to know.

> To provide a "completion notification event" interface on
> top of an existing nonblocking interface, one needs an "async poll"
> mechanism with edge-triggered events with no event coalescing.

If you have a top-notch completion notification event interface
provided natively by the OS, though, does that get rid of the
need for the "async poll" mechanism?

> You are correct in recognizing NT completion ports from my description.
> While the NT completion port interface is ugly as sin, it gets a number
> of performance issues right.
>
> > And, come to think of it, network programmers usually can be categorized
> > into the same two groups :-) Each style of programming is an acquired taste.
>
> I would say that the "completion notification" style is a paradigm
> beyond the "readiness notification" style. I started with the select()
> model of network programming and have since learned the clear
> superiority of the "completion notificatin" style.

Both seem to have their place, and deserve good support, IMHO.

- Dan