Hi --
I have been running some experiments involving processes sending large
volumes of data concurrently. The results show (on Linux 2.6.19.2)
that although the total throughput achieved by all the processes
remains constant, the jitter increases as the number of processes
increases. Beyond about 64 processes (on a 2.4GHz Xeon with 4Mb of
cache), processes start getting starved and the streams get very
bursty.
Are there any steps one can take to make the jitter scale better, eg.
by using another scheduler? I guess that it is reasonable that the
jitter grow with added contention through the TCP/IP stack - but what
growth rate is acceptable? Is the data I have below reasonable?
The jitter varies as follows and is shown as an average+/- sd across
25 10-second intervals.
Concurrency Jitter (us)
1 1.6+/-0.8
2 1.5+/-1.1
4 1.4+/-0.6
8 0.8+/-0.5
12 2.3+/-1.2
16 3.3+/-2.1
20 4.6+/-2.5
24 6.2+/-1.4
28 7.8+/-3.2
32 10.0+/-3.4
64 100+
Sen
> I have been running some experiments involving processes sending large
> volumes of data concurrently. The results show (on Linux 2.6.19.2)
> that although the total throughput achieved by all the processes
> remains constant, the jitter increases as the number of processes
> increases. Beyond about 64 processes (on a 2.4GHz Xeon with 4Mb of
> cache), processes start getting starved and the streams get very
> bursty.
It's hard to know exactly what's going on in your case. But my guess is it's
basically this simple: all processes decide to send data at roughly the same
time (because they're using the same timesource) and it simply takes a while
to get around to all of them. If this is the issue, the solution is simply
to rig things so that not all processes decide to send at precisely the same
time.
This may involve raising the value of HZ. It may involve changing your
timing logic.
Think about it this way -- if each process is trying to send data every
tenth of a second, and they all try to send at precisely each even tenth of
a second of wall time, each process will on average have to wait for half of
the processes to finish before it sends. (And your CPU will be blasted each
tenth of a second and idle in-between.) But if they each pick a random
offset from the even second of wall time, the CPU will be evenly loaded and
processes won't have to wait so much.
If your CPU usage is lower than 75% or so, that would indicate that this may
be the issue. (Though it could also be the burstiness of any I/O needed to
get the stream data, if you're testing with real data. Look for processes
blocked on I/O.)
Basically, troubleshoot. Is the CPU maxed? If so, then you know what the
problem is. If not, why not?!
By the way, a process-per-stream model is really not likely to be a
particularly good one. For one thing, it forces a full process context
switch every time you switch which stream you're working on. For another
thing, it requires each stream to run completely independent timing code,
rather than leveraging one timing engine. (There may be other good reasons
to choose this model, of course, but you may be seeing some of its downsides
here.)
DS