Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759471AbZAGPwm (ORCPT ); Wed, 7 Jan 2009 10:52:42 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753099AbZAGPwd (ORCPT ); Wed, 7 Jan 2009 10:52:33 -0500 Received: from qw-out-2122.google.com ([74.125.92.24]:1803 "EHLO qw-out-2122.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752654AbZAGPwc (ORCPT ); Wed, 7 Jan 2009 10:52:32 -0500 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:cc:subject :references:in-reply-to:content-type:content-transfer-encoding; b=osjzGTVAX7mbp0/GNpNPLMo2BePf3K63a6rFGBKg0CoVwNnrio1HpffFnqYIJLoXdc 5NGZygY7WGyRac7IaSTXLZDnqAgYCJn0AJWQiF15jBwMVt/dFFHX4Tv7kdKBjrEVj5yz oHueYmsnco+hDnWVqLHoMugsvqPA7nEgL9Y64= Message-ID: <4964CFBB.9050404@gmail.com> Date: Wed, 07 Jan 2009 07:52:27 -0800 From: "Justin P. Mattock" User-Agent: Thunderbird 2.0.0.18 (X11/20081126) MIME-Version: 1.0 To: =?ISO-8859-1?Q?Fr=E9d=E9ric_Weisbecker?= CC: Heiko Carstens , Linus Torvalds , Andrew Morton , Rusty Russell , Pekka Enberg , linux-kernel@vger.kernel.org, Jeff Chua Subject: Re: [PATCH] stop_machine/cpu hotplug: fix disable_nonboot_cpus References: <4963F368.7080909@gmail.com> <84144f020901062248j5d406656wb21130d914c7749d@mail.gmail.com> <84144f020901070030k6fb888f6n84255078e4885d28@mail.gmail.com> <20090107091534.GA4633@osiris.boeblingen.de.ibm.com> <1231319946.14720.7.camel@penberg-laptop> <20090107122728.GB4633@osiris.boeblingen.de.ibm.com> <20090107151946.GA25560@osiris.boeblingen.de.ibm.com> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3736 Lines: 97 Fr?d?ric Weisbecker wrote: > 2009/1/7 Heiko Carstens : > >> From: Heiko Carstens >> >> disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the >> caller already created the stop_machine workqueue (like cpu_down does). >> Otherwise a call to stop_machine will lead to accesses to random memory >> regions. >> >> When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85 >> "stop_machine: introduce stop_machine_create/destroy") I missed the second >> call site of _cpu_down. >> So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus >> as well. >> >> Fixes suspend-to-ram/disk and also this bug: >> >> [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b >> [ 286.548940] IP: [] __stop_machine+0x88/0xe3 >> [ 286.550598] Oops: 0002 [#1] SMP >> [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5 >> [ 286.560580] EIP: is at __stop_machine+0x88/0xe3 >> [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30 >> [ 286.560580] Call Trace: >> [ 286.560580] [] ? _cpu_down+0x10f/0x234 >> [ 286.560580] [] ? disable_nonboot_cpus+0x58/0xdc >> [ 286.560580] [] ? kernel_poweroff+0x22/0x39 >> [ 286.560580] [] ? sys_reboot+0xde/0x14c >> [ 286.560580] [] ? complete_signal+0x179/0x191 >> [ 286.560580] [] ? send_signal+0x1cc/0x1e1 >> [ 286.560580] [] ? _spin_unlock_irqrestore+0x2d/0x3c >> [ 286.560580] [] ? group_send_signal_info+0x58/0x61 >> [ 286.560580] [] ? kill_pid_info+0x30/0x3a >> [ 286.560580] [] ? sys_kill+0x75/0x13a >> [ 286.560580] [] ? mntput_no_expire+ox1f/0x101 >> [ 286.560580] [] ? dput+0x1e/0x105 >> [ 286.560580] [] ? __fput+0x150/0x158 >> [ 286.560580] [] ? audit_syscall_entry+0x137/0x159 >> [ 286.560580] [] ? sysenter_do_call+0x12/0x34 >> >> Reported-by: "Justin P. Mattock" >> Reviewed-by: Pekka Enberg >> Signed-off-by: Heiko Carstens >> --- >> kernel/cpu.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> Index: linux-2.6/kernel/cpu.c >> =================================================================== >> --- linux-2.6.orig/kernel/cpu.c >> +++ linux-2.6/kernel/cpu.c >> @@ -379,8 +379,11 @@ static cpumask_var_t frozen_cpus; >> >> int disable_nonboot_cpus(void) >> { >> - int cpu, first_cpu, error = 0; >> + int cpu, first_cpu, error; >> >> + error = stop_machine_create(); >> + if (error) >> + return error; >> cpu_maps_update_begin(); >> first_cpu = cpumask_first(cpu_online_mask); >> /* We take down all of the non-boot CPUs in one shot to avoid races >> @@ -409,6 +412,7 @@ int disable_nonboot_cpus(void) >> printk(KERN_ERR "Non-boot CPUs are not disabled\n"); >> } >> cpu_maps_update_done(); >> + stop_machine_destroy(); >> return error; >> } >> >> > > > That should explain why suspend to disk failed on my box yesterday on > the processors stage... > Thanks! > > O.K. applied the patch, and shutdown the machine a few times; no freeze, no bug message. sweet!!. Now I'm gonna try and dismantle a bug message for educational purposes. Thanks for the assistance. regards; Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/