Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S966827Ab2JZWe3 (ORCPT ); Fri, 26 Oct 2012 18:34:29 -0400 Received: from mga01.intel.com ([192.55.52.88]:8085 "EHLO mga01.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S966778Ab2JZWe2 convert rfc822-to-8bit (ORCPT ); Fri, 26 Oct 2012 18:34:28 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="4.80,657,1344236400"; d="scan'208";a="240717181" From: "Luck, Tony" To: "Theodore Ts'o" CC: Naoya Horiguchi , "Kleen, Andi" , "Wu, Fengguang" , Andrew Morton , Jan Kara , "Jun'ichi Nomura" , Akira Fujita , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "linux-ext4@vger.kernel.org" Subject: RE: [PATCH 2/3] ext4: introduce ext4_error_remove_page Thread-Topic: [PATCH 2/3] ext4: introduce ext4_error_remove_page Thread-Index: AQHNssNn60IFoxEWWkW+xJzYf95sNpfLkXcAgAA0tFCAAJ4qgP//weDA Date: Fri, 26 Oct 2012 22:24:23 +0000 Message-ID: <3908561D78D1C84285E8C5FCA982C28F19D5A388@ORSMSX108.amr.corp.intel.com> References: <1351177969-893-1-git-send-email-n-horiguchi@ah.jp.nec.com> <1351177969-893-3-git-send-email-n-horiguchi@ah.jp.nec.com> <20121026061206.GA31139@thunk.org> <3908561D78D1C84285E8C5FCA982C28F19D5A13B@ORSMSX108.amr.corp.intel.com> <20121026184649.GA8614@thunk.org> In-Reply-To: <20121026184649.GA8614@thunk.org> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.22.254.138] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 8BIT MIME-Version: 1.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2510 Lines: 51 > Well, we could set a new attribute bit on the file which indicates > that the file has been corrupted, and this could cause any attempts to > open the file to return some error until the bit has been cleared. That sounds a lot better than renaming/moving the file. > This would persist across reboots. The only problem is that system > administrators might get very confused (at least at first, when they > first run a kernel or a distribution which has this feature enabled). Yes. This would require some education. But new attributes have been added in the past (e.g. immutable) that caused confusion to users and tools that didn't know about them. > Application programs could also get very confused when any attempt to > open or read from a file suddenly returned some new error code (EIO, > or should we designate a new errno code for this purpose, so there is > a better indication of what the heck was going on?) EIO sounds wrong ... but it is perhaps the best of the existing codes. Adding a new one is also challenging too. > Also, if we just log the message in dmesg, if the system administrator > doesn't find the "this file is corrupted" bit right away This is pretty much a given. Nobody will see the message in the console log until it is far too late. > I'm not sure it's worth it to go to these extents, but I could imagine > some customers wanting to have this sort of information. Do we know > what their "nice to have" / "must have" requirements might be? 18 years ago Intel rather famously attempted to sell users on the idea that a rare divide error that sometimes gave the wrong answer could be ignored. Before my time at Intel, but it is still burned into the corporate psyche that customers really don't like to get the wrong answers from their computers. Whether it is worth it may depend on the relative frequency of data being corrupted this way, compared to all the other ways that it might get messed up. If it were a thousand times more likely that data got silently corrupted on its path to media, sitting spinning on the media, and then back off the drive again - then all this fancy stuff wouldn't make any real difference. I have no data on the relative error rates of memory and i/o - so I can't answer this. -Tony -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/