RT-ers,
Lately we've been struggling with some performance issues on high-core
count (>16 cores) NUMA machines with the RT kernel. During the course
of troubleshooting this issue, we tried using the 'numactl' program to
constrain our measurement testing tool (rteval) to a particular memory
node, rather than letting everything float. Doing so showed marked
improvement in both max latency and jitter. While this doesn't solve
our performance problems I thought it might make sense to have a --numa
mode for cylictest that compliments the --smp mode just added.
The big difference here is that when using --numa, each measurement
thread (one per cpu) has it's stack allocated from the memory node
associated with it's cpu. Also, the major data structures for each
thread (parameter block, statistics block and histogram) are allocated
from the appropriate node. This is done with calls into libnuma,
which means this will add a dependency on libnuma.
The intent is to measure latency on a numa system in the same way a
well-written RT application would run on a NUMA machine, that is
minimizing the off-node memory references.
If you're interested in looking at this, please pull the numa branch
from my git repo at:
git://git.kernel.org/pub/scm/linux/kernel/git/clrkwllms/rt-tests.git
and let me know if you find bugs or disagree with the approach.
Thanks,
Clark
On Tue, 19 Jan 2010, Clark Williams wrote:
> RT-ers,
>
> Lately we've been struggling with some performance issues on high-core
> count (>16 cores) NUMA machines with the RT kernel. During the course
> of troubleshooting this issue, we tried using the 'numactl' program to
> constrain our measurement testing tool (rteval) to a particular memory
> node, rather than letting everything float. Doing so showed marked
> improvement in both max latency and jitter. While this doesn't solve
> our performance problems I thought it might make sense to have a --numa
> mode for cylictest that compliments the --smp mode just added.
>
> The big difference here is that when using --numa, each measurement
> thread (one per cpu) has it's stack allocated from the memory node
> associated with it's cpu. Also, the major data structures for each
> thread (parameter block, statistics block and histogram) are allocated
> from the appropriate node. This is done with calls into libnuma,
> which means this will add a dependency on libnuma.
That might cause some trouble for embedded folks. :(
> The intent is to measure latency on a numa system in the same way a
> well-written RT application would run on a NUMA machine, that is
> minimizing the off-node memory references.
Agreed.
tglx
On Wed, 20 Jan 2010 07:51:41 +0100 (CET)
Thomas Gleixner <[email protected]> wrote:
> On Tue, 19 Jan 2010, Clark Williams wrote:
> > RT-ers,
> >
> > Lately we've been struggling with some performance issues on high-core
> > count (>16 cores) NUMA machines with the RT kernel. During the course
> > of troubleshooting this issue, we tried using the 'numactl' program to
> > constrain our measurement testing tool (rteval) to a particular memory
> > node, rather than letting everything float. Doing so showed marked
> > improvement in both max latency and jitter. While this doesn't solve
> > our performance problems I thought it might make sense to have a --numa
> > mode for cylictest that compliments the --smp mode just added.
> >
> > The big difference here is that when using --numa, each measurement
> > thread (one per cpu) has it's stack allocated from the memory node
> > associated with it's cpu. Also, the major data structures for each
> > thread (parameter block, statistics block and histogram) are allocated
> > from the appropriate node. This is done with calls into libnuma,
> > which means this will add a dependency on libnuma.
>
> That might cause some trouble for embedded folks. :(
Yeah, that's why I send the RFC, wanted to see who would hate me for
it :).
Carsten already told me off-list that one of his build machines didn't
have numa.h, so I'm going to have to rearrange the build a bit.
As much as I hate to say it, I think the best option is to use autoconf
to detect if libnuma is available on the build platform and to take
appropriate steps if it's not.
The other idea I toyed with was dynamic loading of libnuma so there's
not an install dependency for the libnuma package with the rt-tests
package. I only use five functions from libnuma, so that's not too bad
a set of function pointers to manage. Hmmm, that probably won't work
very well, since I'll still have to include numa.h. Sigh...
Clark
> The other idea I toyed with was dynamic loading of libnuma so there's
> not an install dependency for the libnuma package with the rt-tests
> package. I only use five functions from libnuma, so that's not too bad
> a set of function pointers to manage. Hmmm, that probably won't work
> very well, since I'll still have to include numa.h. Sigh...
It's an overkill.
We are not talking about use package to be installed into millions of
systems in binary form.
rt-tests are used by developers; developers may compile those with needed
options.
Nikita