Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1761446AbYHEMrX (ORCPT ); Tue, 5 Aug 2008 08:47:23 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1759175AbYHEMrP (ORCPT ); Tue, 5 Aug 2008 08:47:15 -0400 Received: from earthlight.etchedpixels.co.uk ([81.2.110.250]:51437 "EHLO lxorguk.ukuu.org.uk" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1756854AbYHEMrO (ORCPT ); Tue, 5 Aug 2008 08:47:14 -0400 Date: Tue, 5 Aug 2008 13:29:57 +0100 From: Alan Cox To: linasvepstas@gmail.com Cc: "Robert Hancock" , "John Stoffel" , "Alistair John Strachan" , linux-kernel@vger.kernel.org Subject: Re: amd64 sata_nv (massive) memory corruption Message-ID: <20080805132957.398c1036@lxorguk.ukuu.org.uk> In-Reply-To: <3ae3aa420808042229l675ffd79p42a5691532b7ac3b@mail.gmail.com> References: <489675DC.2080906@shaw.ca> <3ae3aa420808042229l675ffd79p42a5691532b7ac3b@mail.gmail.com> X-Mailer: Claws Mail 3.5.0 (GTK+ 2.12.11; x86_64-redhat-linux-gnu) Organization: Red Hat UK Cyf., Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE, Y Deyrnas Gyfunol. Cofrestrwyd yng Nghymru a Lloegr o'r rhif cofrestru 3798903 Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1194 Lines: 25 > have EDAC turned on, or something ... I'm investigating now. > But this is moot -- if there is software that already exists that > could have reported the error to the kernel, then this software > should have been installed/enabled/operating by default. That gets you into arguments with the people who care about performance but its really a distribution level debate and I suspect the answer is itself distro specific depending on usage/ > Personally I'm ready to pop $$$ for ECC it if will actually do > something for me, this has been painful. On a decent system ECC will do something. A modern server PC actually has pretty good coverage on CPU L1, L2 and optionally RAM. I/O controllers and disk internal caches seem to be a bit more variable which is one reason big HPC cluster projects often checksum end to end - when you produce terabytes of data all the one in a hundred billion error stats start to look less than reassuring. Alan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/