Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755849Ab0KMKJ6 (ORCPT ); Sat, 13 Nov 2010 05:09:58 -0500 Received: from hera.kernel.org ([140.211.167.34]:42722 "EHLO hera.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754191Ab0KMKJz (ORCPT ); Sat, 13 Nov 2010 05:09:55 -0500 Message-ID: <4CDE63D3.4010608@kernel.org> Date: Sat, 13 Nov 2010 11:09:23 +0100 From: Tejun Heo User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9.2.12) Gecko/20101027 Lightning/1.0b2 Thunderbird/3.1.6 MIME-Version: 1.0 To: akataria@vmware.com CC: linux-mm , LKML , Andrew Morton , Petr Vandrovec , Pekka Enberg , Christoph Lameter Subject: Re: (mem hotplug, pcpu_alloc) BUG: sleeping function called from invalid context at kernel/mutex.c:94 References: <1289588178.7486.15.camel@ank32.eng.vmware.com> In-Reply-To: <1289588178.7486.15.camel@ank32.eng.vmware.com> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.2.3 (hera.kernel.org [127.0.0.1]); Sat, 13 Nov 2010 10:09:28 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3702 Lines: 80 Hello, On 11/12/2010 07:56 PM, Alok Kataria wrote: > We have seen following might_sleep warning while hot adding memory... > > [ 142.339267] BUG: sleeping function called from invalid context at kernel/mutex.c:94 > [ 142.339276] in_atomic(): 0, irqs_disabled(): 1, pid: 4, name: migration/0 > [ 142.339283] Pid: 4, comm: migration/0 Not tainted 2.6.35.6-45.fc14.x86_64 #1 > [ 142.339288] Call Trace: > [ 142.339305] [] __might_sleep+0xeb/0xf0 > [ 142.339316] [] mutex_lock+0x24/0x50 > [ 142.339326] [] pcpu_alloc+0x6d/0x7ee > [ 142.339336] [] ? load_balance+0xbe/0x60e > [ 142.339343] [] ? rt_se_boosted+0x21/0x2f > [ 142.339349] [] ? dequeue_rt_stack+0x18b/0x1ed > [ 142.339356] [] __alloc_percpu+0x10/0x12 > [ 142.339362] [] setup_zone_pageset+0x38/0xbe > [ 142.339373] [] ? build_zonelists_node.clone.58+0x79/0x8c > [ 142.339384] [] __build_all_zonelists+0x419/0x46c > [ 142.339395] [] ? cpu_stopper_thread+0xb2/0x198 > [ 142.339401] [] stop_machine_cpu_stop+0x8e/0xc5 > [ 142.339407] [] ? stop_machine_cpu_stop+0x0/0xc5 > [ 142.339414] [] cpu_stopper_thread+0x108/0x198 > [ 142.339420] [] ? schedule+0x5b2/0x5cc > [ 142.339426] [] ? cpu_stopper_thread+0x0/0x198 > [ 142.339434] [] kthread+0x7f/0x87 > [ 142.339443] [] kernel_thread_helper+0x4/0x10 > [ 142.339449] [] ? kthread+0x0/0x87 > [ 142.339455] [] ? kernel_thread_helper+0x0/0x10 > [ 142.340099] Built 5 zonelists in Node order, mobility grouping on. Total pages: 289456 > [ 142.340108] Policy zone: Normal > > > This warning was seen on the FC14 kernel, though looking at the current > git, the problem seems to exist on mainline too. > The problem is that pcpu_alloc expects that it is called from non-atomic > context as a result it grabs the pcpu_alloc_mutex. > In the memory-hotplug case though, we do end up calling pcpu_alloc from > atomic context, while all cpus are stopped. > > void build_all_zonelists(void *data) > { > set_zonelist_order(); > > if (system_state == SYSTEM_BOOTING) { > __build_all_zonelists(NULL); > mminit_verify_zonelist(); > cpuset_init_current_mems_allowed(); > } else { > /* we have to stop all cpus to guarantee there is no user > of zonelist */ > stop_machine(__build_all_zonelists, data, NULL); <========= > /* cpuset refresh routine should be here */ > } > > __build_all_zonelists eventually calls pcpu_alloc. > > I didn't dive through the history, so am not sure when was this > regression introduced, but could have regressed with the new pcpu memory > allocator. Meh... the percpu allocator required user context from the beginning. The new allocator didn't change that. Wouldn't it be possible to prepare hotplug outside of cpu_stop and use stop_machine() only to make it available to the system. In general, it's a very bad idea to allocate memory from inside stop_machine. The whole machine is stopped, after all. In general, it shouldn't be too difficult to add new resource without stop_machine too unlike removing one. Pekka, Christoph, any ideas? Thanks. -- tejun -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/