Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756594Ab1FDMWs (ORCPT ); Sat, 4 Jun 2011 08:22:48 -0400 Received: from smtp-out3.tiscali.nl ([195.241.79.178]:53477 "EHLO smtp-out3.tiscali.nl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756517Ab1FDMWr (ORCPT ); Sat, 4 Jun 2011 08:22:47 -0400 Subject: Re: Mysterious CFQ crash and RCU From: Paul Bolle To: Vivek Goyal Cc: "Paul E. McKenney" , Jens Axboe , linux kernel mailing list Date: Sat, 04 Jun 2011 14:22:26 +0200 In-Reply-To: <20110603134514.GA31057@redhat.com> References: <20110519222404.GG12600@redhat.com> <20110521210013.GJ2271@linux.vnet.ibm.com> <20110523152141.GB4019@redhat.com> <20110523153848.GC2310@linux.vnet.ibm.com> <1306401337.27271.3.camel@t41.thuisdomein> <20110603050724.GB2304@linux.vnet.ibm.com> <20110603134514.GA31057@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.1.1 (3.1.1-3.fc16) Content-Transfer-Encoding: 7bit Message-ID: <1307190166.23387.15.camel@t41.thuisdomein> Mime-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1790 Lines: 41 On Fri, 2011-06-03 at 09:45 -0400, Vivek Goyal wrote: > PaulB mentioned that crash happened at May 26 10:47:07. I am wondering > how are we able to sample the data after the crash. I am assuming > that above data gives information only before crash and does not > tell us anything about what happened just before crash. What am I missing. Well, what you called a "CFQ crash" is an Oops (apparently generated by arch/x86/mm/fault.c:show_fault_oops()). But the traces I posted at the bugzilla.redhat.com issue for this always end with: "Fixing recursive fault but reboot is needed" (see kernel/exit.c:do_exit()). At that point the system is still running. Perhaps you run with panic_on_oops on by default (rumor has it that's an RHEL default) which might make the result of this Oops surprising. Anyhow, it turns out that my system is suspiciously happy after the process(es) causing this Oops has (have) finished. See the big friendly warning I put on top of the message in which I pasted the output of Paul's script: > 1) Big friendly warning: the "CFQ crash" that occurred while running > your script didn't happen in a clean session. Not at all! It actually > happened after (summarized a bit): > - two "CFQ crashes" with the patch for Jens' first idea; > - switching to deadline > - removing cfq_iosched > - recompiling cfq-iosched.ko (to revert Jens' patch) > - installing cfq_iosched.ko > - inserting cfq_iosched > - switching back to cfq again (Yes, putting "CFQ crash" in quotes there was a bit of legalese on my part.) Paul Bolle -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/