Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760649AbZFQJzv (ORCPT ); Wed, 17 Jun 2009 05:55:51 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1754517AbZFQJzn (ORCPT ); Wed, 17 Jun 2009 05:55:43 -0400 Received: from mga03.intel.com ([143.182.124.21]:30026 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753581AbZFQJzm (ORCPT ); Wed, 17 Jun 2009 05:55:42 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.42,235,1243839600"; d="scan'208";a="155518727" Date: Wed, 17 Jun 2009 17:55:32 +0800 From: Wu Fengguang To: Nick Piggin Cc: Andi Kleen , Balbir Singh , Andrew Morton , LKML , Ingo Molnar , Mel Gorman , Thomas Gleixner , "H. Peter Anvin" , Peter Zijlstra , Hugh Dickins , "riel@redhat.com" , "chris.mason@oracle.com" , "linux-mm@kvack.org" Subject: Re: [RFC][PATCH] HWPOISON: only early kill processes who installed SIGBUS handler Message-ID: <20090617095532.GA25001@localhost> References: <4A35BD7A.9070208@linux.vnet.ibm.com> <20090615042753.GA20788@localhost> <20090615064447.GA18390@wotan.suse.de> <20090615070914.GC31969@one.firstfloor.org> <20090615071907.GA8665@wotan.suse.de> <20090615121001.GA10944@localhost> <20090615122528.GA13256@wotan.suse.de> <20090615142225.GA11167@localhost> <20090617063702.GA20922@localhost> <20090617080404.GB31192@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090617080404.GB31192@wotan.suse.de> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3739 Lines: 79 On Wed, Jun 17, 2009 at 04:04:04PM +0800, Nick Piggin wrote: > On Wed, Jun 17, 2009 at 02:37:02PM +0800, Wu Fengguang wrote: > > On Mon, Jun 15, 2009 at 10:22:25PM +0800, Wu Fengguang wrote: > > > On Mon, Jun 15, 2009 at 08:25:28PM +0800, Nick Piggin wrote: > > > > On Mon, Jun 15, 2009 at 08:10:01PM +0800, Wu Fengguang wrote: > > > > > On Mon, Jun 15, 2009 at 03:19:07PM +0800, Nick Piggin wrote: > > > > > > > For KVM you need early kill, for the others it remains to be seen. > > > > > > > > > > > > Right. It's almost like you need to do a per-process thing, and > > > > > > those that can handle things (such as the new SIGBUS or the new > > > > > > EIO) could get those, and others could be killed. > > > > > > > > > > To send early SIGBUS kills to processes who has called > > > > > sigaction(SIGBUS, ...)? KVM will sure do that. For other apps we > > > > > don't mind they can understand that signal at all. > > > > > > > > For apps that hook into SIGBUS for some other means and > > > > > > Yes I was referring to the sigaction(SIGBUS) apps, others will > > > be late killed anyway. > > > > > > > do not understand the new type of SIGBUS signal? What about > > > > those? > > > > > > We introduced two new SIGBUS codes: > > > BUS_MCEERR_AO=5 for early kill > > > BUS_MCEERR_AR=4 for late kill > > > I'd assume a legacy application will handle them in the same way (both > > > are unexpected code to the application). > > > > > > We don't care whether the application can be killed by BUS_MCEERR_AO > > > or BUS_MCEERR_AR depending on its SIGBUS handler implementation. > > > But (in the rare case) if the handler > > > - refused to die on BUS_MCEERR_AR, it may create a busy loop and > > > flooding of SIGBUS signals, which is a bug of the application. > > > BUS_MCEERR_AO is one time and won't lead to busy loops. > > > - does something that hurts itself (ie. data safety) on BUS_MCEERR_AO, > > > it may well hurt the same way on BUS_MCEERR_AR. The latter one is > > > unavoidable, so the application must be fixed anyway. > > > > This patch materializes the automatically early kill idea. > > It aims to remove the vm.memory_failure_ealy_kill sysctl parameter. > > > > This is mainly a policy change, please comment. > > Well then you can still early-kill random apps that did not > want it, and you may still cause problems if its sigbus > handler does something nontrivial. > > Can you use a prctl or something so it can expclitly > register interest in this? No I don't think prctl would be much better. - if an application want early/late kill, it can do so with a proper written SIGBUS handler: the prctl call is redundant. - if an admin want to control early/late kill for an unmodified app, prctl is as unhelpful as this patch(*). - prctl does can help legacy apps whose SIGBUS handler has trouble with the new SIGBUS codes, however such application should be rare and the application should be fixed(why shall it do something wrong on newly introduced code at all? Shall we stop introducing new codes just because some random buggy app cannot handle new codes?) So I still prefer this patch, until we come up with some solution that allows both app and admin to change the setting. (*) Or if prctl options can be inherited across exec(), we may write a wrapper tool to set early kill preference for some legacy app: # this_app_can_be_early_killed some_legacy_app --legacy-options Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/