Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752782AbbGNRMo (ORCPT ); Tue, 14 Jul 2015 13:12:44 -0400 Received: from out02.mta.xmission.com ([166.70.13.232]:39578 "EHLO out02.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751691AbbGNRMl (ORCPT ); Tue, 14 Jul 2015 13:12:41 -0400 From: ebiederm@xmission.com (Eric W. Biederman) To: Vivek Goyal Cc: dwalker@fifo99.com, Hidehiro Kawai , Andrew Morton , linux-mips@linux-mips.org, Baoquan He , linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, kexec@lists.infradead.org, linux-kernel@vger.kernel.org, Ingo Molnar , HATAYAMA Daisuke , Masami Hiramatsu , linuxppc-dev@lists.ozlabs.org, linux-metag@vger.kernel.org, linux-arm-kernel@lists.infradead.org References: <20150710113331.4368.10495.stgit@softrs> <20150710113331.4368.63745.stgit@softrs> <87wpy82kqf.fsf@x220.int.ebiederm.org> <20150713202611.GA16525@fifo99.com> <87h9p7r0we.fsf@x220.int.ebiederm.org> <20150714135919.GA18333@fifo99.com> <20150714150208.GD10792@redhat.com> <20150714153430.GA18766@fifo99.com> <20150714154040.GA3912@redhat.com> <20150714154833.GA18883@fifo99.com> <20150714161612.GH10792@redhat.com> Date: Tue, 14 Jul 2015 12:06:15 -0500 In-Reply-To: <20150714161612.GH10792@redhat.com> (Vivek Goyal's message of "Tue, 14 Jul 2015 12:16:12 -0400") Message-ID: <87a8uyoeig.fsf@x220.int.ebiederm.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-AID: U2FsdGVkX19T4t1UIQWdOp69jV3E/lWhRdk0v7cFLF4= X-SA-Exim-Connect-IP: 67.3.205.90 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.5 XMGappySubj_01 Very gappy subject * 1.5 XMNoVowels Alpha-numberic number with no vowels * 0.7 XMSubLong Long Subject * 0.0 TVD_RCVD_IP Message was received from an IP address * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.5000] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa07 1397; Body=1 Fuz1=1 Fuz2=1] * 0.0 T_TooManySym_01 4+ unique symbols in subject X-Spam-DCC: XMission; sa07 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: **;Vivek Goyal X-Spam-Relay-Country: X-Spam-Timing: total 445 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 3.8 (0.8%), b_tie_ro: 2.7 (0.6%), parse: 0.82 (0.2%), extract_message_metadata: 19 (4.2%), get_uri_detail_list: 3.3 (0.7%), tests_pri_-1000: 10 (2.2%), tests_pri_-950: 1.32 (0.3%), tests_pri_-900: 1.14 (0.3%), tests_pri_-400: 29 (6.6%), check_bayes: 28 (6.3%), b_tokenize: 10 (2.2%), b_tok_get_all: 10 (2.2%), b_comp_prob: 2.9 (0.6%), b_tok_touch_all: 3.5 (0.8%), b_finish: 0.70 (0.2%), tests_pri_0: 373 (83.9%), tests_pri_500: 4.1 (0.9%), rewrite_mail: 0.00 (0.0%) Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Wed, 24 Sep 2014 11:00:52 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4543 Lines: 90 Vivek Goyal writes: > On Tue, Jul 14, 2015 at 03:48:33PM +0000, dwalker@fifo99.com wrote: >> On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote: >> > On Tue, Jul 14, 2015 at 03:34:30PM +0000, dwalker@fifo99.com wrote: >> > > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote: >> > > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, dwalker@fifo99.com wrote: >> > > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote: >> > > > > > dwalker@fifo99.com writes: >> > > > > > >> > > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote: >> > > > > > >> Hidehiro Kawai writes: >> > > > > > >> >> > > > > > >> > You can call panic notifiers and kmsg dumpers before kdump by >> > > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter. >> > > > > > >> > However, it doesn't make sense if kdump is not available. In that >> > > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that >> > > > > > >> > you can't change the value of the parameter. >> > > > > > >> >> > > > > > >> Nacked-by: "Eric W. Biederman" >> > > > > > > >> > > > > > > I think it would make sense if he just replaced "kdump" with "kexec". >> > > > > > >> > > > > > It would be less insane, however it still makes no sense as without >> > > > > > kexec on panic support crash_kexec is a noop. So the value of the >> > > > > > seeting makes no difference. >> > > > > >> > > > > Can you explain more, I don't really understand what you mean. Are you suggesting >> > > > > the whole "crash_kexec_post_notifiers" feature has no value ? >> > > > >> > > > Daniel, >> > > > >> > > > BTW, why are you using crash_kexec_post_notifiers commandline? Why not >> > > > without it? >> > > >> > > It was explained in the prior thread but to rehash, the notifiers are used to do a switch >> > > over from the crashed machine to another redundant machine. >> > >> > So why not detect failure using polling or issue notifications from second >> > kernel. >> > >> > IOW, expecting that a crashed machine will be able to deliver notification >> > reliably is falwed to begin with, IMHO. >> >> It's flawed to think you can kexec, but you still do it right ? I've not gotten into >> the deep details of this switching process, but that's how this interface is used. > > Sure. But the deal here is that users of interface know that sometimes it > can be unreliable. And in the absence of more reliable mechanism, somewhat > less reliable mechanism is fine. > >> >> > If a machine is failing, there are high chance it can't deliver you the >> > notification. Detecting that failure suing some kind of polling mechanism >> > might be more reliable. And it will make even kdump mechanism more >> > reliable so that it does not have to run panic notifiers after the crash. >> >> I think what your suggesting is that my company should change how it's hardware works >> and that's not really an option for me. This isn't a simple thing like checking over the >> network if the machine is down or not, this is way more complex hardware design. > > That means you are ready to live with an unreliable design. There might be > cases where notifier does not get run properly and you will not do switch > despite the fact that OS has failed. I was just trying to nudge you in > a direction which could be more reliable mechanism. Sigh I see some deep confusion going on here. The panic notifiers are just that panic notifiers. They have not been nor should they be tied to kexec. If those notifiers force a switch over of between machines I fail to see why you would care if it was kexec or another panic situation that is forcing that switchover. Now if you want a reliable design, I strongly recommend as I have been recommending for the 15 years that magic failover code be placed in either the new kernel or a stub that preceedes the new kernel. That gives the greatest reliabilty we know how to engineer, and it lets you do whatever you need to do. Especially if it is not desirable for the panic notifiers to run without the presence of kexec, I very strongly recommend not using them at all and just writing a stub of code that can run before a new kernel starts. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/