I have a program that spawns 3 threads and there is a great deal of
sharing among the three threads. they all read/update a couple of
shared variables.
I run this on nehalem (8 cores, 4 core in each physical package), I
find that running the 3 threads on core 0, 1, 2 (using
pthread_setaffinitiy) gives much better results, this is because the
cache coherency protocol performs much better on the same package.
However, if I leave Linux (Suse Enterprise) to schedule it. it gives
much worse performance, i suspect that linux is scheduling the 3
threads across the physical packages ( 2 on one package, 1 on
another). Is this possible ? why does linux do this ?
Thanks
Xin
Xin Tong wrote:
> I have a program that spawns 3 threads and there is a great deal of
> sharing among the three threads. they all read/update a couple of
> shared variables.
>
> I run this on nehalem (8 cores, 4 core in each physical package), I
> find that running the 3 threads on core 0, 1, 2 (using
> pthread_setaffinitiy) gives much better results, this is because the
> cache coherency protocol performs much better on the same package.
> However, if I leave Linux (Suse Enterprise) to schedule it. it gives
> much worse performance, i suspect that linux is scheduling the 3
> threads across the physical packages ( 2 on one package, 1 on
> another). Is this possible ? why does linux do this ?
the following paper on cpu schedulers was posted recently. Maybe it gives you
a starting point.
http://research.cs.wisc.edu/wind/Publications/meehean-thesis11.html