Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755151AbbGVFSe (ORCPT ); Wed, 22 Jul 2015 01:18:34 -0400 Received: from mail-pa0-f52.google.com ([209.85.220.52]:35953 "EHLO mail-pa0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753558AbbGVFSc (ORCPT ); Wed, 22 Jul 2015 01:18:32 -0400 Date: Tue, 21 Jul 2015 22:18:27 -0700 From: =?iso-8859-1?Q?J=F6rn?= Engel To: Mike Galbraith Cc: Spencer Baugh , Don Zickus , Andrew Morton , Ulrich Obergfell , Ingo Molnar , Andrew Jones , chai wen , Chris Metcalf , Stephane Eranian , open list , Spencer Baugh , Joern Engel Subject: Re: [PATCH] soft lockup: kill realtime threads before panic Message-ID: <20150722051827.GD23662@Sligo.logfs.org> References: <1437516477-30554-5-git-send-email-sbaugh@catern.com> <1437539790.3106.42.camel@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1437539790.3106.42.camel@gmail.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1652 Lines: 37 On Wed, Jul 22, 2015 at 06:36:30AM +0200, Mike Galbraith wrote: > On Tue, 2015-07-21 at 15:07 -0700, Spencer Baugh wrote: > > > We have observed cases where the soft lockup detector triggered, but no > > kernel bug existed. Instead we had a buggy realtime thread that > > monopolized a cpu. So let's kill the responsible party and not panic > > the entire system. > > If you don't tell the kernel to panic, it won't, and if you don't remove > its leash (the throttle), your not so tame rt beast won't maul you. Not sure if this patch is something for mainline, but those two alternatives have problems of their own. Not panicking on lockups can leave a system disabled until some human come around. In many cases that human will do no better than power-cycle. A panic reduces the downtime. And the realtime throttling gives non-realtime threads some minimum runtime, but does nothing to help low-priority realtime threads. It also introduces latencies, often when workloads are high and you would like any available cpu to get through that rough spot. I don't think we have a good answer to this problem in the mainline kernel yet. J?rn -- To announce that there must be no criticism of the President, or that we are to stand by the President, right or wrong, is not only unpatriotic and servile, but is morally treasonable to the American public. -- Theodore Roosevelt, Kansas City Star, 1918 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/