Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750932AbVKXNjL (ORCPT ); Thu, 24 Nov 2005 08:39:11 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751019AbVKXNjL (ORCPT ); Thu, 24 Nov 2005 08:39:11 -0500 Received: from mx2.suse.de ([195.135.220.15]:11653 "EHLO mx2.suse.de") by vger.kernel.org with ESMTP id S1750932AbVKXNjK (ORCPT ); Thu, 24 Nov 2005 08:39:10 -0500 Date: Thu, 24 Nov 2005 14:39:07 +0100 From: Andi Kleen To: "Eric W. Biederman" Cc: Andi Kleen , Alan Cox , Gerd Knorr , Linus Torvalds , Dave Jones , Zachary Amsden , Pavel Machek , Andrew Morton , Linux Kernel Mailing List , "H. Peter Anvin" , Zwane Mwaikambo , Pratap Subrahmanyam , Christopher Li , Ingo Molnar Subject: Re: [patch] SMP alternatives Message-ID: <20051124133907.GG20775@brahms.suse.de> References: <437B5A83.8090808@suse.de> <438359D7.7090308@suse.de> <1132764133.7268.51.camel@localhost.localdomain> <20051123163906.GF20775@brahms.suse.de> <1132766489.7268.71.camel@localhost.localdomain> <20051123165923.GJ20775@brahms.suse.de> <1132783243.13095.17.camel@localhost.localdomain> <20051124131310.GE20775@brahms.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1323 Lines: 29 > I think I see the source of the confusion. Scrubbing is the > process of taking data that is correctable and writing it back to > memory so that if a second correctable error occurs the net is still > corrected. That's supposed to be done by hardware, no? At least the K8 has a hardware scrubber (although it's not always enabled) > Directed killing of processes is something that must be done > inside a synchronous exception (like a machine check) because otherwise > it is so racy you don't know who has seen the bad data. If you try to do it this way then the code will become such a mess if not impossible to write that your changes to merge them and get it right are very slim. The only sane way to do all the locking etc. is to hand over the handling to a thread. While that make the window of misusing the data wider it's the only sane alternative vs not doing it at all. Also due to the way hardware works with machine checks usually being async and not precise works you have that window anyways, so it's not even worse. Also consider multiple CPUs. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/