Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752617Ab0FCMdI (ORCPT ); Thu, 3 Jun 2010 08:33:08 -0400 Received: from mail-wy0-f174.google.com ([74.125.82.174]:34971 "EHLO mail-wy0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751322Ab0FCMdG convert rfc822-to-8bit (ORCPT ); Thu, 3 Jun 2010 08:33:06 -0400 DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=date:message-id:from:to:cc:subject:in-reply-to:references :user-agent:mime-version:content-type:content-transfer-encoding; b=ASdD7kvEaWjgo1fkU/hi+IcY6SntIY1uIHUYPj3/Qk6DB2E1JpsdFolLWzGB1EemSV 8Fzhbdm1kCnYetbRRgzLKplwkbKIzFOGpsz3Mbgeya6/uFK8v4NzM1GsnB1r3Uf6fOHi /I/h5NV45WlEZ3yFqeXytZT5ug2kb8mLYhnYQ= Date: Thu, 03 Jun 2010 14:33:03 +0200 Message-ID: <874ohkuwhs.wl%vmayatsk@redhat.com> From: Vitaly Mayatskikh To: Andi Kleen Cc: Vitaly Mayatskikh , Vivek Goyal , linux-kernel@vger.kernel.org, Andrew Morton , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Randy Dunlap Subject: Re: [PATCH 0/5] kdump: extract log buffer and registers from vmcore on NMI button pressing In-Reply-To: <87d3w8xy3q.fsf@basil.nowhere.org> References: <1275464359-1566-1-git-send-email-v.mayatskih@gmail.com> <20100602151611.GA3174@redhat.com> <87iq60a3rh.wl%vmayatsk@redhat.com> <87d3w8xy3q.fsf@basil.nowhere.org> User-Agent: Wanderlust/2.15.9 (Almost Unreal) Emacs/23.2 Mule/6.0 (HANACHIRUSATO) MIME-Version: 1.0 (generated by SEMI 1.14.6 - "Maruoka") Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2511 Lines: 59 At Thu, 03 Jun 2010 11:30:01 +0200, Andi Kleen wrote: > > Obviously, this change doesn't help if 2nd kernel is not able to > > boot. But there are other problems, which may prevent vmcore to be > > captured. For example, machine has RAM > HDD and it may save vmcore > > only over network. If network fails (e.g., due to bugs in NIC drivers > > or NFS, what is not so rare), and dump capture environment is > > non-interactive, or it doesn't have development tools like `crash', > > there's no chance even to guess what has happened. > > In this case you don't need NMI, sysrq or some /sys trigger > is good enough. Yes, it can be enough if you still can login. Also NMI-part is small and can be easily changed/removed. > NMI would be only needed if the crash kernel is completely > hosed too. That's the case. > > Other possibilities of failure may include broken RAID controller, > > HDD, RAM. NMI button in such situations is a last chance to see old > > log. > > The big problem is that the NMI is used by more and more subsystems, > and several of them tend to eat all NMIs, so the leftovers are less and > less. Overall I would not consider it reliable. True. But as a last hope, when nothing else helps, it still may be worth trying :) > Also NMI buttons are not actually all that common. True as well. This feature is generally not for desktop systems, but for large servers running critical apps. Usually such servers have NMI button facility (directly at front of chassis or as a function in remote console software). > I'm also not sure you really need the analysis in kernel space. > > Why not have a user space program that does a quick analysis > of the previous vmcore and dumps a summary only? In fact > I suspect crash can already do that. I agree, that's fine and usually is enough, if it's still possible to login into system and run this utility. What about scenario when console session is available only for 1 unit in the rack at the same time, main kernel crashed, and dump capture environment stuck? User attaches to that machine, but cannot even login, so the kdump kernel is probably also semi-dead. Also he don't see analysis dump, produced by the utility, because he attached too late to see it's output. -- wbr, Vitaly -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/