Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757251AbZJEALe (ORCPT ); Sun, 4 Oct 2009 20:11:34 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1752648AbZJEALd (ORCPT ); Sun, 4 Oct 2009 20:11:33 -0400 Received: from mail-gx0-f212.google.com ([209.85.217.212]:62831 "EHLO mail-gx0-f212.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752180AbZJEALc convert rfc822-to-8bit (ORCPT ); Sun, 4 Oct 2009 20:11:32 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=SpOQ9MbQvMpUHM/RY9zylyYn6RdZyzwByOUATO1h2a0IVWAN5JwRqCAfj95ls4xVAe V/ftisWSYb4KmM8V8x2pX6Jp6QzUh3MzosBUPLQygd5w3QDWel7OkSjrBRR5i17F1Ucr WmVZdxFInrZF3BrTEkG60oxTbOp8OaanBJ+sU= MIME-Version: 1.0 In-Reply-To: <20091004174113.GB24418@elte.hu> References: <4A922F82.9080000@cn.fujitsu.com> <4A9251EB.8040805@gmail.com> <20090825085919.GB14003@elte.hu> <4A94803A.5060408@gmail.com> <20090826073351.GE23435@elte.hu> <4A9549E5.5020002@gmail.com> <20091002211211.GA2633@redhat.com> <20091004174113.GB24418@elte.hu> Date: Sun, 4 Oct 2009 17:10:55 -0700 Message-ID: Subject: Re: system gets stuck in a lock during boot From: Justin Mattock To: Ingo Molnar Cc: Jason Baron , Peter Zijlstra , Li Zefan , Steven Rostedt , Frederic Weisbecker , Linux Kernel Mailing List Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6686 Lines: 189 On Sun, Oct 4, 2009 at 10:41 AM, Ingo Molnar wrote: > > * Jason Baron wrote: > >> On Mon, Sep 07, 2009 at 02:49:44PM -0700, Justin Mattock wrote: >> > >> >> > >> * Justin P. Mattock ?wrote: >> > >> >> > >> >> > >>> >> > >>> Ingo Molnar wrote: >> > >>> >> > >>>> >> > >>>> * Justin Mattock ? wrote: >> > >>>> >> > >>>> >> > >>>> >> > >>>>> >> > >>>>> O.K. I feel better, deleted >> > >>>>> my system, and threw in a minimal built system >> > >>>>> with only the bare essentials to boot. >> > >>>>> (just to make sure things are correct). >> > >>>>> >> > >>>>> unfortunately after building rc6 I'm still hitting >> > >>>>> this. really am not sure why this is happening. >> > >>>>> >> > >>>>> >> > >>>> >> > >>>> Could you please double-check the bisection result by doing this: >> > >>>> >> > >>>> ? git revert af6af30c0f >> > >>>> >> > >>>> on the latest kernel and seeing whether that fixes the lockup? >> > >>>> >> > >>>> Bisections are very efficient and hence very sensitive as well to >> > >>>> minimal errors. Just one small mistake near the end of a bisection >> > >>>> can blame the wrong commit. >> > >>>> >> > >>>> So the best way to double-check such 100%-triggerable crashes is to >> > >>>> do the revert. I tried the revert and it can be done fine here. >> > >>>> >> > >>>> [ _If_ that does not fix the bug then to save time you can >> > >>>> ? ? 'backtrack' the bisection, instead of re-doing it completely. >> > >>>> ? ? I.e. you have your bisection log, re-check the final steps going >> > >>>> ? ? backwards. Once you find a discrepancy (i.e. a 'bad' point that >> > >>>> ? ? is 'good' or the other way around), redo the bisection log >> > >>>> ? ? commands up to that point and continue it up to the end. ] >> > >>>> >> > >>>> ? ? ? ?Ingo >> > >>>> >> > >>>> >> > >>>> >> > >>> >> > >>> shoot, I did not see your post here. when looking at my bisect >> > >>> log, I guess after a git bisect reset it clears? >> > >>> >> > >>> Anyways after git bisect had finished I looked manually at the >> > >>> commits that it had generated the one which I had sent in a post >> > >>> previously, and this one: >> > >>> >> > >>> ?9424edc2da097c8589fcc24a72552d33e54be161 >> > >>> >> > >> >> > >> (this commit has no effect on your kernel image, at all.) >> > >> >> > >> >> > > >> > > yep. but it was worth a try. >> > >>> >> > >>> at the time looking at the commit, I see this to be more of the >> > >>> cause because of it being related to elf as so forth, but as soon >> > >>> as I reverted this on rc6 made no difference.(the previous commit >> > >>> fixes this for me, on a regular tar.ball as well as in git. >> > >>> >> > >>> I think at this point since this system is a fresh from scratch >> > >>> build, I think something might be wrong that I'm doing (all the >> > >>> CFLAGS, and such are in a previous post). >> > >>> >> > >>> At the moment I don't have a problem applying a patch to the >> > >>> kernel for this. especially since I'm the only one that seems to >> > >>> be hitting this, then if more and more reports of this happen then >> > >>> we can go from there. >> > >>> >> > >> >> > >> What would be nice is to verify your bisection end result, i.e. do >> > >> what i suggested: >> > >> >> > >> >> > > >> > > yeah I've done this on both kernels three to be exact, and all boot after >> > > reverting >> > > Fix perf-tracepoint OOPS. >> > > >> > > As for my system, I'm still convinced that I might be doing something wrong >> > > over here. >> > > >> > >>>> Could you please double-check the bisection result by doing this: >> > >>>> >> > >>>> ? git revert af6af30c0f >> > >>>> >> > >>>> on the latest kernel and seeing whether that fixes the lockup? >> > >>>> >> > >> >> > >> if this doesnt fix it on latest -git then this commit is not the >> > >> cause of the lockup. >> > >> >> > >> ? ? ? ?Ingo >> > >> >> > >> >> > > >> > > This commit(Fix perf-tracepoint OOPS.)does fix my stuckage, but I'm left, as >> > > well as others asking >> > > the question of why. >> > > In any case I still think I'm setting something wrong with either gcc, or >> > > something >> > > that might be causing this from userland. >> > > >> > > Justin P. Mattock >> > > >> > >> > O.k. here something awkward about this issue I was >> > experiencing. at the moment I have two imac's >> > here the descriptions: >> > >> > imac A) the one with the problem >> > >> > OS: built from the clfs book >> > x86_64 multilib with only lib64 >> > >> > built everything with these flags: >> > CFLAGS="-m64 -mtune=core2 -march=core2 >> > -mfpmath=both -O2 -pipe -fomit-frame-pointer >> > -fstack-protection" >> > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}" >> > while compiling everything with >> > gcc version: 4.5.0 20090730 >> > >> > >> > imac B) the one that works >> > >> > OS: clfs(just built a few days ago) >> > x86_64 pure64 bit build >> > (lib with a symlink to lib64) >> > CFLAGS="-m64 -mtune=core2 -march=core2 >> > ?-O2 -pipe -fomit-frame-pointer" >> > CXXFLAGS="${CFLAGS}" MAKEOPTS="{-j3}" >> > gcc version: 4.4.1 (GCC for Cross-LFS 4.4.1.20090722) >> > >> > The only things I can think of is either I hit something >> > because of gcc, something goes wrong with the libraries, >> > or there something happening with either the option >> > of mfpmath=both or stackprotection. >> > >> > At this point since the kernel seems to be running fine, >> > is to just trash the system that has this issue and just leave >> > it at, I was hitting some weird anomaly. >> > >> >> hi Justin, >> >> I've been playing around with gcc '4.5' as well and hit a panic that >> looks very similar to what you've seen with stock 2.6.31 - I haven't >> seen it anywhere else. Anyways, it seems to be some sort of alignment >> issue with the 'struct ftrace_event_call'. I'm not sure yet if this is a >> compiler or kernel issue. But the following kernel patch fixes the issue >> for me. It would be interesting to verify if the patch also resolves the >> issue for you. > > Would be nice to know precisely what kind of problem is being hit here - > we'd like to fix either the kernel or GCC - depending on where the bug > lies. > > ? ? ? ?Ingo > So I wasn't going crazy.... Anyways that system(clfs) I still have, I can go ahead and put it back on the machine and see if I hit this again(keep in mind, just got back from a 7hr drive, so it might be tomorrow). -- Justin P. Mattock -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/