Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755496AbYHYUxR (ORCPT ); Mon, 25 Aug 2008 16:53:17 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753002AbYHYUxB (ORCPT ); Mon, 25 Aug 2008 16:53:01 -0400 Received: from smtp1.linux-foundation.org ([140.211.169.13]:41105 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752961AbYHYUxA (ORCPT ); Mon, 25 Aug 2008 16:53:00 -0400 Date: Mon, 25 Aug 2008 13:52:23 -0700 (PDT) From: Linus Torvalds To: "Alan D. Brunelle" cc: "Rafael J. Wysocki" , Linux Kernel Mailing List , Kernel Testers List , Andrew Morton , Arjan van de Ven , Rusty Russell Subject: Re: [Bug #11342] Linux 2.6.27-rc3: kernel BUG at mm/vmalloc.c - bisected In-Reply-To: Message-ID: References: <48B29F7B.6080405@hp.com> <48B2A421.7080705@hp.com> <48B313E0.1000501@hp.com> User-Agent: Alpine 1.10 (LFD 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2427 Lines: 56 On Mon, 25 Aug 2008, Linus Torvalds wrote: > > But I'll look at your vmlinux, see what stands out. Oops. I already see the problem. Your .config has soem _huge_ CPU count, doesn't it? checkstack.pl shows these things as the top problems: 0xffffffff80266234 smp_call_function_mask [vmlinux]: 2736 0xffffffff80234747 __build_sched_domains [vmlinux]: 2232 0xffffffff8023523f __build_sched_domains [vmlinux]: 2232 0xffffffff8021e884 setup_IO_APIC_irq [vmlinux]: 1616 0xffffffff8021ee24 arch_setup_ht_irq [vmlinux]: 1600 0xffffffff8021f144 arch_setup_msi_irq [vmlinux]: 1600 0xffffffff8021e3b0 __assign_irq_vector [vmlinux]: 1592 0xffffffff8021e626 __assign_irq_vector [vmlinux]: 1592 0xffffffff8023257e move_task_off_dead_cpu [vmlinux]: 1592 0xffffffff802326e8 move_task_off_dead_cpu [vmlinux]: 1592 0xffffffff8025dbc5 tick_handle_oneshot_broadcast [vmlinux]:1544 0xffffffff8025dcb4 tick_handle_oneshot_broadcast [vmlinux]:1544 0xffffffff803f3dc4 store_scaling_governor [vmlinux]: 1376 0xffffffff80279ef4 cpuset_write_resmask [vmlinux]: 1360 0xffffffff803f465d cpufreq_add_dev [vmlinux]: 1352 0xffffffff803f495b cpufreq_add_dev [vmlinux]: 1352 0xffffffff803f3fc4 store_scaling_max_freq [vmlinux]: 1328 0xffffffff803f4064 store_scaling_min_freq [vmlinux]: 1328 0xffffffff803f44c4 cpufreq_update_policy [vmlinux]: 1328 .. and sys_init_module is actually way way down the list. I bet the only reason it showed up at all was because dynamically it was such a deep callchain, and part of that callchain probably called some of those really nasty things. Anyway, the reason smp_call_function_mask and friends have such _huge_ stack usages for you is that they contain a 'cpumask_t' on the stack. For example, for me, usign a sane NR_CPU, the size of the stack frame for smp_call_function_mask is under 200 bytes. For you, it's 2736 bytes. How about you make CONFIG_NR_CPU's something _sane_? Like 16? Or do you really have four thousand CPU's in that system? Oh, I guess you have the MAXSMP config enabled? I really think that was a bit too aggressive. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/