Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755581AbbGVFlx (ORCPT ); Wed, 22 Jul 2015 01:41:53 -0400 Received: from mail-wi0-f182.google.com ([209.85.212.182]:36325 "EHLO mail-wi0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754998AbbGVFlv (ORCPT ); Wed, 22 Jul 2015 01:41:51 -0400 Message-ID: <1437543708.3106.70.camel@gmail.com> Subject: Re: [PATCH] soft lockup: kill realtime threads before panic From: Mike Galbraith To: =?ISO-8859-1?Q?J=F6rn?= Engel Cc: Spencer Baugh , Don Zickus , Andrew Morton , Ulrich Obergfell , Ingo Molnar , Andrew Jones , chai wen , Chris Metcalf , Stephane Eranian , open list , Spencer Baugh , Joern Engel Date: Wed, 22 Jul 2015 07:41:48 +0200 In-Reply-To: <20150722051827.GD23662@Sligo.logfs.org> References: <1437516477-30554-5-git-send-email-sbaugh@catern.com> <1437539790.3106.42.camel@gmail.com> <20150722051827.GD23662@Sligo.logfs.org> Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.12.11 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1798 Lines: 41 On Tue, 2015-07-21 at 22:18 -0700, Jörn Engel wrote: > On Wed, Jul 22, 2015 at 06:36:30AM +0200, Mike Galbraith wrote: > > On Tue, 2015-07-21 at 15:07 -0700, Spencer Baugh wrote: > > > > > We have observed cases where the soft lockup detector triggered, but no > > > kernel bug existed. Instead we had a buggy realtime thread that > > > monopolized a cpu. So let's kill the responsible party and not panic > > > the entire system. > > > > If you don't tell the kernel to panic, it won't, and if you don't remove > > its leash (the throttle), your not so tame rt beast won't maul you. > > Not sure if this patch is something for mainline, but those two > alternatives have problems of their own. Not panicking on lockups can > leave a system disabled until some human come around. In many cases > that human will do no better than power-cycle. A panic reduces the > downtime. If a realtime task goes bonkers, the realtime game is over, you're down. > And the realtime throttling gives non-realtime threads some minimum > runtime, but does nothing to help low-priority realtime threads. It > also introduces latencies, often when workloads are high and you would > like any available cpu to get through that rough spot. You can use group scheduling as a debug crutch until the little beasts are housebroken. > I don't think we have a good answer to this problem in the mainline > kernel yet. IMHO, there's no point in trying to make rt warm/fuzzy/cuddly. Just don't stuff a Hells Angel into a super-suit, that gets real ugly ;-) -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/