Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934047AbYA2TOl (ORCPT ); Tue, 29 Jan 2008 14:14:41 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1765693AbYA2TO1 (ORCPT ); Tue, 29 Jan 2008 14:14:27 -0500 Received: from iabervon.org ([66.92.72.58]:35051 "EHLO iabervon.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933761AbYA2TO0 (ORCPT ); Tue, 29 Jan 2008 14:14:26 -0500 Date: Tue, 29 Jan 2008 14:14:13 -0500 (EST) From: Daniel Barkalow To: Alan Cox cc: Richard Heck , Gene Heskett , Zan Lynx , Calvin Walton , Linux Kernel Mailing List , Linux ide Mailing list Subject: Re: Problem with ata layer in 2.6.24 In-Reply-To: <20080129184613.16846ae5@core> Message-ID: References: <200801272122.21823.gene.heskett@gmail.com> <1201539043.31293.7.camel@zem> <1201540830.6526.19.camel@localhost> <200801281230.32910.gene.heskett@gmail.com> <479E1D9E.3000900@bobjweil.com> <20080129121201.2f727f5f@core> <20080129184613.16846ae5@core> User-Agent: Alpine 1.00 (LNX 882 2007-12-20) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2310 Lines: 46 On Tue, 29 Jan 2008, Alan Cox wrote: > > The SCSI error reporting really ought to include a simple interpretation > > of the error for end users ("The drive doesn't support this command" "A > > sector's data got lost" "The drive timed out" "The drive failed" "The > > drive is entirely gone"). There's too much similarity between the message > > you get when you try a SMART test that doesn't apply to the drive and what > > you get when the drive is broken. > > That would be the SCSI verbose messages option. I think the Eric > Youngdale consortium added it about Linux 1.2. Nowdays its always built > that way. I've seen a lot of verbosity out of SCSI messages, but I haven't seen a straightforward interpretation of the problem in there. It's all information useful for debugging, not information useful for system administration. > > And it's possible that the error recovery is suboptimal in some cases. It > > seems to like resetting drives too much; perhaps if it keeps seeing the > > same problem and resetting the drive, it should decide that the drive's > > error reporting is just bad and just ignore that error like the old IDE > > did (but, in this case, after saying what it's doing). > > Nothing like casually praying the users data hasn't gone for a walk is > there. If we don't act on them the users don't report them until > something really bad occurs so that isn't an option. On the other hand, bringing the system down because a device is misbehaving is a poor idea. I've personally recovered most of the data off of a dying drive because the system was willing to let me keep using the drive anyway; IIRC, the drive didn't work at all after a reboot, so I would have lost all the data instead of only a little had the system insisted on a perfectly functioning drive in order to use it at all. There ought to be some middle ground between doing nothing until the computer really breaks and breaking the computer before then, but that's an issue not specific to libata. -Daniel *This .sig left intentionally blank* -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/