Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752770AbYKLDkP (ORCPT ); Tue, 11 Nov 2008 22:40:15 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751468AbYKLDkB (ORCPT ); Tue, 11 Nov 2008 22:40:01 -0500 Received: from ozlabs.org ([203.10.76.45]:44930 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751129AbYKLDkA (ORCPT ); Tue, 11 Nov 2008 22:40:00 -0500 From: Rusty Russell To: Ingo Molnar Subject: Re: [Bug #11989] Suspend failure on NForce4-based boards due to chanes in stop_machine Date: Wed, 12 Nov 2008 14:09:49 +1030 User-Agent: KMail/1.10.1 (Linux/2.6.27-7-generic; KDE/4.1.2; i686; ; ) Cc: "Rafael J. Wysocki" , Heiko Carstens , Linux Kernel Mailing List , Kernel Testers List , Vegard Nossum , Peter Zijlstra , Oleg Nesterov , Dmitry Adamushko , Andrew Morton References: <200811102355.42389.rjw@sisk.pl> <20081111105214.GA15645@elte.hu> In-Reply-To: <20081111105214.GA15645@elte.hu> MIME-Version: 1.0 Content-Disposition: inline Message-Id: <200811112256.58467.rusty@rustcorp.com.au> Content-Type: text/plain; charset="iso-8859-1" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by alpha id mAC3eadq024367 Content-Length: 2922 Lines: 9 On Tuesday 11 November 2008 21:22:14 Ingo Molnar wrote:> * Rafael J. Wysocki wrote:> > So, it evidently fails while re-enabling the non-boot CPU and not> > during disabling it as I thought before. (Resend, due to HTML version previously) But what is calling stop_machine in that path? There *is* a race, but I don't think it could cause this (we should make acopy of active.fnret inside the lock before returning it). Two patches: one fixes that race, the next adds debugging spew. stop_machine: fix race with return value We should not access active.fnret outside the lock; in theory the nextstop_machine could overwrite it. Signed-off-by: Rusty Russell --- kernel/stop_machine.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff -r d7c9a15da615 kernel/stop_machine.c--- a/kernel/stop_machine.c Mon Nov 10 09:47:45 2008 +1100+++ b/kernel/stop_machine.c Tue Nov 11 23:19:47 2008 +1030@@ -112,7 +112,7 @@ int __stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus) { struct work_struct *sm_work;- int i;+ int i, ret; /* Set up initial state. */ mutex_lock(&lock);@@ -137,8 +137,9 @@ /* This will release the thread on our CPU. */ put_cpu(); flush_workqueue(stop_machine_wq);+ ret = active.fnret; mutex_unlock(&lock);- return active.fnret;+ return ret; } int stop_machine(int (*fn)(void *), void *data, const cpumask_t *cpus)===diff -r fe7dd39b1cff kernel/stop_machine.c--- a/kernel/stop_machine.c Wed Nov 12 14:07:18 2008 +1030+++ b/kernel/stop_machine.c Wed Nov 12 14:09:08 2008 +1030@@ -89,6 +89,8 @@ case STOPMACHINE_RUN: /* On multiple CPUs only a single error code * is needed to tell that something failed. */+ printk("stop_machine: %i running %p\n",+ smp_processor_id(), smdata->fn); err = smdata->fn(smdata->data); if (err) smdata->fnret = err;@@ -106,6 +108,7 @@ /* Callback for CPUs which aren't supposed to do anything. */ static int chill(void *unused) {+ printk("stop_machine: %i chilling\n", smp_processor_id()); return 0; } @@ -126,17 +129,23 @@ set_state(STOPMACHINE_PREPARE); + printk("stop_machine: running on %i cpus:\n", num_threads);+ dump_stack();+ /* Schedule the stop_cpu work on all cpus: hold this CPU so one * doesn't hit this CPU until we're ready. */ get_cpu(); for_each_online_cpu(i) {+ printk("stop_machine: setting up cpu %i\n", i); sm_work = percpu_ptr(stop_machine_work, i); INIT_WORK(sm_work, stop_cpu); queue_work_on(i, stop_machine_wq, sm_work); } /* This will release the thread on our CPU. */+ printk("stop_machine: releasing CPU %i\n", smp_processor_id()); put_cpu(); flush_workqueue(stop_machine_wq);+ printk("stop_machine: done\n"); ret = active.fnret; mutex_unlock(&lock); return ret;????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?