2000-10-28 00:47:03

by John Myers

[permalink] [raw]
Subject: Re: Linux's implementation of poll() not scalable?

Linus Torvolds wrote:
> So sticky arrays of events are good, while queues are bad. Let's take that
> as one of the fundamentals.

Please let's not. There is nothing fundamentally different between an
event queue of size N and an interest set of size N.

Your proposed interface suffers from most of the same problems as the
other Unix event interfaces I've seen. Key among the problems are
inherent race conditions when the interface is used by multithreaded
applications.

The "stickiness" of the event binding can cause race conditions where
two threads simultaneously attempt to handle the same event. For
example, consider a socket becomes writeable, delivering a writable
event to one of the multiple threads calling get_events(). While the
callback for that event is running, it writes to the socket, causing the
socket to become non-writable and then writeable again. That in turn
can cause another writable event to be delivered to some other thread.

In the async I/O library I work with, this problem is addressed by
having delivery of the event atomically remove the binding. If the
event needs to be bound after the callback is finished, then it is the
responsibility for the callback to rebind the event. Typically, a
server implements a command/response protocol, where the read callback
reads a command, writes the response, and repeats. It is quite likely
that after the read callback completes, the connection is interested in
a write event, not another read event.

Cancellation of event bindings is another area prone to race
conditions. A thread canceling an event binding frequently needs to
know whether or not that event has been delivered to some other thread.
If it has, the canceling thread has to synchronize with the other thread
before destroying any resources associated with the event.

Multiple event queues are needed by libraries as different libraries can
have different thread pools. The threads in those different pools can
have different characteristics, such as stack size, priority, CPU
affinity, signal state, etc.

There are three performance issues that need to be addressed by the
implementation of get_events(). One is that events preferably be given
to threads that are the same CPU as bound the event. That allows the
event's context to remain in the CPU's cache.

Two is that of threads on a given CPU, events should wake threads in
LIFO order. This allows excess worker threads to be paged out.

Three is that the event queue should limit the number of worker threads
contending with each other for CPU. If an event is triggered while
there are enough worker threads in runnable state, it is better to wait
for one of those threads to block before delivering the event.


Attachments:
smime.p7s (2.10 kB)
S/MIME Cryptographic Signature