Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933658AbbDUQJG (ORCPT ); Tue, 21 Apr 2015 12:09:06 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:57472 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932217AbbDUPeH (ORCPT ); Tue, 21 Apr 2015 11:34:07 -0400 From: Luis Henriques To: linux-kernel@vger.kernel.org, stable@vger.kernel.org, kernel-team@lists.ubuntu.com Cc: Michael Ellerman , Anton Blanchard , Luis Henriques Subject: [PATCH 3.16.y-ckt 074/144] powerpc/smp: Wait until secondaries are active & online Date: Tue, 21 Apr 2015 16:30:59 +0100 Message-Id: <1429630329-21748-75-git-send-email-luis.henriques@canonical.com> X-Mailer: git-send-email 1.9.1 In-Reply-To: <1429630329-21748-1-git-send-email-luis.henriques@canonical.com> References: <1429630329-21748-1-git-send-email-luis.henriques@canonical.com> X-Extended-Stable: 3.16 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2618 Lines: 66 3.16.7-ckt10 -stable review patch. If anyone has any objections, please let me know. ------------------ From: Michael Ellerman commit 875ebe940d77a41682c367ad799b4f39f128d3fa upstream. Anton has a busy ppc64le KVM box where guests sometimes hit the infamous "kernel BUG at kernel/smpboot.c:134!" issue during boot: BUG_ON(td->cpu != smp_processor_id()); Basically a per CPU hotplug thread scheduled on the wrong CPU. The oops output confirms it: CPU: 0 Comm: watchdog/130 The problem is that we aren't ensuring the CPU active bit is set for the secondary before allowing the master to continue on. The master unparks the secondary CPU's kthreads and the scheduler looks for a CPU to run on. It calls select_task_rq() and realises the suggested CPU is not in the cpus_allowed mask. It then ends up in select_fallback_rq(), and since the active bit isnt't set we choose some other CPU to run on. This seems to have been introduced by 6acbfb96976f "sched: Fix hotplug vs. set_cpus_allowed_ptr()", which changed from setting active before online to setting active after online. However that was in turn fixing a bug where other code assumed an active CPU was also online, so we can't just revert that fix. The simplest fix is just to spin waiting for both active & online to be set. We already have a barrier prior to set_cpu_online() (which also sets active), to ensure all other setup is completed before online & active are set. Fixes: 6acbfb96976f ("sched: Fix hotplug vs. set_cpus_allowed_ptr()") Signed-off-by: Michael Ellerman Signed-off-by: Anton Blanchard Signed-off-by: Michael Ellerman Signed-off-by: Luis Henriques --- arch/powerpc/kernel/smp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 1007fb802e6b..383859d125de 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -546,8 +546,8 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle) if (smp_ops->give_timebase) smp_ops->give_timebase(); - /* Wait until cpu puts itself in the online map */ - while (!cpu_online(cpu)) + /* Wait until cpu puts itself in the online & active maps */ + while (!cpu_online(cpu) || !cpu_active(cpu)) cpu_relax(); return 0; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/