Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760614AbYFRFfj (ORCPT ); Wed, 18 Jun 2008 01:35:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1760291AbYFRFfV (ORCPT ); Wed, 18 Jun 2008 01:35:21 -0400 Received: from gw.goop.org ([64.81.55.164]:40782 "EHLO mail.goop.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760217AbYFRFfS (ORCPT ); Wed, 18 Jun 2008 01:35:18 -0400 Message-ID: <48589E6C.2030403@goop.org> Date: Tue, 17 Jun 2008 22:34:36 -0700 From: Jeremy Fitzhardinge User-Agent: Thunderbird 2.0.0.14 (X11/20080501) MIME-Version: 1.0 To: Mike Travis CC: Ingo Molnar , Andrew Morton , Christoph Lameter , David Miller , Eric Dumazet , linux-kernel@vger.kernel.org, the arch/x86 maintainers Subject: Re: [crash, bisected] Re: [PATCH 3/4] x86_64: Fold pda into per cpu area References: <20080604003018.538497000@polaris-admin.engr.sgi.com> <20080604003019.509483000@polaris-admin.engr.sgi.com> <20080605102222.GA21319@elte.hu> <48480DFB.7000404@sgi.com> <4848F55C.90904@goop.org> <48493861.3050402@sgi.com> In-Reply-To: <48493861.3050402@sgi.com> X-Enigmail-Version: 0.95.6 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3961 Lines: 111 Mike Travis wrote: > Jeremy Fitzhardinge wrote: > >> Mike Travis wrote: >> >>> Ingo Molnar wrote: >>> >>> >>>> * Mike Travis wrote: >>>> >>>> >>>> >>>>> * Declare the pda as a per cpu variable. >>>>> >>>>> * Make the x86_64 per cpu area start at zero. >>>>> >>>>> * Since the pda is now the first element of the per_cpu area, >>>>> cpu_pda() >>>>> is no longer needed and per_cpu() can be used instead. This >>>>> also makes >>>>> the _cpu_pda[] table obsolete. >>>>> >>>>> * Since %gs is pointing to the pda, it will then also point to the >>>>> per cpu >>>>> variables and can be accessed thusly: >>>>> >>>>> %gs:[&per_cpu_xxxx - __per_cpu_start] >>>>> >>>>> Based on linux-2.6.tip >>>>> >>>>> >>>> -tip testing found an instantaneous reboot crash on 64-bit x86, with >>>> this config: >>>> >>>> http://redhat.com/~mingo/misc/config-Thu_Jun__5_11_43_51_CEST_2008.bad >>>> >>>> there is no boot log as the instantaneous reboot happens before >>>> anything is printed to the (early-) serial console. I have bisected >>>> it down to: >>>> >>>> | 7670dc09e89a2b151a1cf49eccebc07c41c2ce9f is first bad commit >>>> | commit 7670dc09e89a2b151a1cf49eccebc07c41c2ce9f >>>> | Author: Mike Travis >>>> | Date: Tue Jun 3 17:30:21 2008 -0700 >>>> | >>>> | x86_64: Fold pda into per cpu area >>>> >>>> the big problem is not just this crash, but that the patch is _way_ >>>> too big: >>>> >>>> arch/x86/Kconfig | 3 + >>>> arch/x86/kernel/head64.c | 34 ++++++-------- >>>> arch/x86/kernel/irq_64.c | 36 ++++++++------- >>>> arch/x86/kernel/setup.c | 90 >>>> ++++++++++++--------------------------- >>>> arch/x86/kernel/setup64.c | 5 -- >>>> arch/x86/kernel/smpboot.c | 51 ---------------------- >>>> arch/x86/kernel/traps_64.c | 11 +++- >>>> arch/x86/kernel/vmlinux_64.lds.S | 1 >>>> include/asm-x86/percpu.h | 48 ++++++-------------- >>>> 9 files changed, 89 insertions(+), 190 deletions(-) >>>> >>>> considering the danger involved, this is just way too large, and >>>> there's no reasonable debugging i can do in the bisection to narrow >>>> it down any further. >>>> >>>> Please resubmit with the bug fixed and with a proper splitup, the >>>> more patches you manage to create, the better. For a dangerous code >>>> area like this, with a track record of frequent breakages in the >>>> past, i would not mind a "one line of code changed per patch" splitup >>>> either. (Feel free to send a git tree link for us to try as well.) >>>> >>>> Ingo >>>> >>>> >>> Thanks for the feedback Ingo. I'll test the above config and look at >>> splitting up the patch. The difficulty is making each patch >>> independently >>> compilable and testable. >>> >> FWIW, I'm getting past the "crashes very, very early" stage with this >> series applied when booting under Xen. Then it crashes pretty early, >> but that's not your fault... >> >> J >> > > Hi Jeremy, > > Yes we have a simulator for Nahelem that also breezes past the boot up > problem (actually makes it to the kernel login prompt.) Weirdly, the > problem doesn't exist in an earlier code base so my changes are tickling > something else newly introduced. I'm attempting to see if I can use > GRUB 2 with the GDB stubs to track it down (which is time consuming in > itself to setup.) > > It is definitely related to basing percpu variable offsets from %gs and > (I think) interrupts. > Hi Mike, Have you made any progress on this? I'm bumping up against it when I run on native hardware (as opposed to under Xen). J -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/