Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756000AbbGVGd3 (ORCPT ); Wed, 22 Jul 2015 02:33:29 -0400 Received: from mail-pd0-f172.google.com ([209.85.192.172]:33496 "EHLO mail-pd0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754694AbbGVGd2 (ORCPT ); Wed, 22 Jul 2015 02:33:28 -0400 Date: Tue, 21 Jul 2015 23:33:23 -0700 From: =?iso-8859-1?Q?J=F6rn?= Engel To: Mike Galbraith Cc: Spencer Baugh , Don Zickus , Andrew Morton , Ulrich Obergfell , Ingo Molnar , Andrew Jones , chai wen , Chris Metcalf , Stephane Eranian , open list , Spencer Baugh , Joern Engel Subject: Re: [PATCH] soft lockup: kill realtime threads before panic Message-ID: <20150722063323.GE23662@Sligo.logfs.org> References: <1437516477-30554-5-git-send-email-sbaugh@catern.com> <1437539790.3106.42.camel@gmail.com> <20150722051827.GD23662@Sligo.logfs.org> <1437543708.3106.70.camel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1437543708.3106.70.camel@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1625 Lines: 36 On Wed, Jul 22, 2015 at 07:41:48AM +0200, Mike Galbraith wrote: > On Tue, 2015-07-21 at 22:18 -0700, J?rn Engel wrote: > > > > Not sure if this patch is something for mainline, but those two > > alternatives have problems of their own. Not panicking on lockups can > > leave a system disabled until some human come around. In many cases > > that human will do no better than power-cycle. A panic reduces the > > downtime. > > If a realtime task goes bonkers, the realtime game is over, you're down. Agreed. But a reboot will often solve the issue. So the automatic panic will repair the system within minutes, while no panic will leave the system broken for days, depending on human response time. Automatic panic is a great way to minimize downtime - or vulnerable time if you have HA. One could argue that killing the realtime thread is even better than panic, as things can restart with a blank slate even faster. But the real benefit is that we get better debug data for the failing component. If we had a kernel bug, the backtrace would usually be sufficient to point fingers. With a bonkers realtime thread, not so much. Anyway, this patch has been useful to us once. If someone deems it merge-worthy, great. If not, I won't lose any sleep either. J?rn -- The key to performance is elegance, not battalions of special cases. -- Jon Bentley and Doug McIlroy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/