LinuxLists.cc - High priority tasks break SMP balancer?

On Tue, Nov 20, 2007 at 06:57:55AM +0100, Ingo Molnar wrote:
>
> * Micah Dowty <[email protected]> wrote:
>
> > > this one is being triggered whenever a cpu becomes idle (schedule()
> > > --> idle_balance() --> load_balance_newidle()).
> > >
> > > (this flag is a bit #1 == 2)
> > >
> > > cat /proc/sys/kernel/sched_domain/cpu0/domain0/flags
> >
> > Hmm. I don't have this file on my system:
> >
> > root@micah-64:/proc/sys/kernel/sched_domain/cpu0/domain0# ls
> > busy_factor busy_idx forkexec_idx idle_idx imbalance_pct max_interval min_interval newidle_idx wake_idx
> > root@micah-64:/proc/sys/kernel/sched_domain/cpu0/domain0# uname -a
> > Linux micah-64 2.6.23.1 #1 SMP Fri Nov 2 12:25:47 PDT 2007 x86_64 GNU/Linux
> >
> > Is there a config option I'm missing?
>
> yes, CONFIG_SCHED_DEBUG.

I have that one. I even posted the /proc/sched_debug output :)

--Micah

2007-11-20 21:48:22

On Tue, Nov 27, 2007 at 10:21:12AM +0100, Dmitry Adamushko wrote:
> On 26/11/2007, Micah Dowty <[email protected]> wrote:
> >
> > The application doesn't really depend on the load-balancer's decisions
> > per se, it just happens that this behaviour I'm seeing on NUMA systems
> > is extremely bad for performance.
> >
> > In this context, the application is a virtual machine runtime which is
> > executing either an SMP VM or it's executing a guest which has a
> > virtual GPU. In either case, there are at least three threads:
> >
> > - Two virtual CPU/GPU threads, which are nice(0) and often CPU-bound
> > - A low-latency event handling thread, at nice(-10)
> >
> > The event handling thread performs periodic tasks like delivering
> > timer interrupts and completing I/O operations.
>
> Are I/O operations initiated by these "virtual CPU/GPU threads"?
>
> If so, would it make sense to have per-CPU event handling threads
> (instead of one global)? They would handle I/O operations initiated
> from their respective CPUs to (hopefully) achieve better data locality
> (esp. if the most part of the involved data is per-CPU).
>
> Then let the load balancer to evenly distribute the "virtual CPU/GPU
> threads" or even (at least, as an experiment) fix them to different
> CPUs as well?
>
> sure, the scenario is highly dependent on a nature of those
> 'events'... and I can just speculate here :-) (but I'd imagine
> situations when such a scenario would scale better).

We already do this when it makes sense to. The high priority "event"
thread is mostly used for asynchronous events- timers (driven by
/dev/rtc), completion of disk or USB I/O, etc. These are things that
happen asynchronously to the virtual CPU, and which may or may not
need to interrupt the virtual CPU at some point.

Another good example is audio. We have another thread (which ideally
is high-priority) which processes audio events asynchronously from the
virtual CPUs. The CPUs manipulate some shared memory to queue up audio
data, but while the audio is playing there is no direct control path
between the two threads. The virtual CPU reads/writes audio registers
whenever it needs to emulate accesses to the audio device, and the
audio thread wakes up any time the hardware needs more samples to
play. The two threads are asynchronous with very little shared
synchronization.

Hopefully that clarifies our situation a bit. Thanks again,
--Micah