Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758938AbYHBUJS (ORCPT ); Sat, 2 Aug 2008 16:09:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1755264AbYHBUJF (ORCPT ); Sat, 2 Aug 2008 16:09:05 -0400 Received: from Mycroft.westnet.com ([216.187.52.7]:47761 "EHLO Mycroft.westnet.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755174AbYHBUJE (ORCPT ); Sat, 2 Aug 2008 16:09:04 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <18580.48861.657366.629904@stoffel.org> Date: Sat, 2 Aug 2008 16:09:01 -0400 From: "John Stoffel" To: linasvepstas@gmail.com Cc: "Alistair John Strachan" , linux-kernel@vger.kernel.org Subject: Re: amd64 sata_nv (massive) memory corruption In-Reply-To: <3ae3aa420808011951l58da4010r1ff0876f255565b0@mail.gmail.com> References: <3ae3aa420808011030weadc61fvf6f850f0a4cfcb3e@mail.gmail.com> <200808012319.05038.alistair@devzero.co.uk> <3ae3aa420808011951l58da4010r1ff0876f255565b0@mail.gmail.com> X-Mailer: VM 8.0.9 under Emacs 22.2.1 (i486-pc-linux-gnu) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1506 Lines: 36 >>>>> "Linas" == Linas Vepstas writes: Linas> 2008/8/1 Alistair John Strachan : >> On Friday 01 August 2008 18:30:34 Linas Vepstas wrote: >>> Hi, >>> >>> I'm seeing strong, easily reproducible (and silent) corruption on a >>> sata-attached >>> disk drive on an amd64 board. It might be the disk itself, but I >>> doubt it; googling >>> suggests that its somehow iommu-related but I cannot confirm this. >> >> Nowhere do you explicitly say you have memtest86'ed the RAM. Linas> It passes memtest86+ just fine. The system has been in heavy Linas> use doing big science calculations on big datasets (multi-gigabyte) Linas> for months; these do not get corrupted when copied/moved around Linas> on the old parallel IDE disk, nor moving/copying on an NFS mount Linas> to a file server. Only the SATA disk is misbehaving. Can you post the output of dmesg after a boot, so we can see which driver is being used? I assume the new Libata stuff, but maybe you can also turn on debugging in there as well. Stuff like SCSI_DEBUG (in the SCSI menus) might show us more details here. Also, have you tried a new SATA cable by any chance? That's obviously the cheaper path than getting a new disk... Good luck, John -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/