Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751040AbVKXOBs (ORCPT ); Thu, 24 Nov 2005 09:01:48 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751023AbVKXOBs (ORCPT ); Thu, 24 Nov 2005 09:01:48 -0500 Received: from [81.2.110.250] ([81.2.110.250]:14559 "EHLO lxorguk.ukuu.org.uk") by vger.kernel.org with ESMTP id S1751040AbVKXOBq (ORCPT ); Thu, 24 Nov 2005 09:01:46 -0500 Subject: Re: [patch] SMP alternatives From: Alan Cox To: Andi Kleen Cc: "Eric W. Biederman" , Gerd Knorr , Linus Torvalds , Dave Jones , Zachary Amsden , Pavel Machek , Andrew Morton , Linux Kernel Mailing List , "H. Peter Anvin" , Zwane Mwaikambo , Pratap Subrahmanyam , Christopher Li , Ingo Molnar In-Reply-To: <20051124133907.GG20775@brahms.suse.de> References: <437B5A83.8090808@suse.de> <438359D7.7090308@suse.de> <1132764133.7268.51.camel@localhost.localdomain> <20051123163906.GF20775@brahms.suse.de> <1132766489.7268.71.camel@localhost.localdomain> <20051123165923.GJ20775@brahms.suse.de> <1132783243.13095.17.camel@localhost.localdomain> <20051124131310.GE20775@brahms.suse.de> <20051124133907.GG20775@brahms.suse.de> Content-Type: text/plain Content-Transfer-Encoding: 7bit Date: Thu, 24 Nov 2005 14:34:07 +0000 Message-Id: <1132842847.13095.105.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.2.3 (2.2.3-2.fc4) Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1314 Lines: 27 On Iau, 2005-11-24 at 14:39 +0100, Andi Kleen wrote: > That's supposed to be done by hardware, no? Varies immensely by system. Where there is a hardware scrubber and it is enabled it will be used. Once nice thing about K8 is the mem controller is in the CPU so they all use the same driver (not yet merged) > If you try to do it this way then the code will become such > a mess if not impossible to write that your changes to merge them > and get it right are very slim. The only sane way to do all the locking etc. > is to hand over the handling to a thread. While that make the window > of misusing the data wider it's the only sane alternative vs not > doing it at all. Its utterly hideous because the usual 'ECC error' reporting technique for an uncorrectable error is an NMI. Locks could be in any state at this point and even the registers needing to be accessed are across PCI and we could be half way through a PCI configuration cycle. The -mm EDAC code works on the basic assumption that unrecovered ECC is a system halter although that is configurable. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/