Date: Tue, 1 Jun 2004 22:46:03 +0200
From: Ingo Molnar <mingo@elte.hu>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>, linux-kernel@vger.kernel.org,
       Andrew Morton <akpm@osdl.org>
Subject: Re: [PATCH] active_load_balance() deadlock
Message-ID: <20040601204603.GA20535@elte.hu>
References: <200406011409.54478.bjorn.helgaas@hp.com> <Pine.LNX.4.58.0406011316190.14095@ppc970.osdl.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <Pine.LNX.4.58.0406011316190.14095@ppc970.osdl.org>
User-Agent: Mutt/1.4.1i
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2173
Lines: 55


* Linus Torvalds <torvalds@osdl.org> wrote:

> On Tue, 1 Jun 2004, Bjorn Helgaas wrote:
> >
> > active_load_balance() looks susceptible to deadlock when busiest==rq.
> > Without the following patch, my 128-way box deadlocks consistently
> > during boot-time driver init.
> 
> Makes sense. The regular "load_balance()" already has that test,
> although it also makes it a WARN_ON() for some unexplained reason (I
> assume find_busiest_group() isn't supposed to find the local group,
> although it doesn't seem to be documented anywhere).
> 
> Ingo, Andrew?

looks good to me. The condition is 'impossible', but the whole balancing
code is (intentionally) a bit racy:

                cpus_and(tmp, group->cpumask, cpu_online_map);
                if (!cpus_weight(tmp))
                        goto next_group;

                for_each_cpu_mask(i, tmp) {
                        if (!idle_cpu(i))
                                goto next_group;
                        push_cpu = i;
                }

                rq = cpu_rq(push_cpu);
                double_lock_balance(busiest, rq);
                move_tasks(rq, push_cpu, busiest, 1, sd, IDLE);

in the for_each_cpu_mask() loop we specifically check for each CPU in
the target group to be idle - so push_cpu's runqueue == busiest [==
current runqueue] cannot be true because the current CPU is not idle, we
are running in the migration thread ... But this is not a real problem,
load-balancing we do in a racy way to reduce overhead [and it's all
statistics anyway so absolute accuracy is impossible], and active
balancing itself is somewhat racy due to the migration-thread wakeup
(and the active_balance flag) going outside the runqueue locks [for
similar reasons].

so it all looks quite plausible - the normal SMP boxes dont trigger it,
but Bjorn's 128-CPU setup with a non-trivial domain hiearachy triggers
it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

	Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/