Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1759599Ab2JYOMt (ORCPT ); Thu, 25 Oct 2012 10:12:49 -0400 Received: from li9-11.members.linode.com ([67.18.176.11]:57500 "EHLO imap.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758321Ab2JYOMn (ORCPT ); Thu, 25 Oct 2012 10:12:43 -0400 Date: Thu, 25 Oct 2012 10:12:26 -0400 From: "Theodore Ts'o" To: Nix Cc: Eric Sandeen , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, "J. Bruce Fields" , Bryan Schumaker , Peng Tao , Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org, Toralf =?iso-8859-1?Q?F=F6rster?= Subject: Re: Apparent serious progressive ext4 data corruption bug in 3.6 (when rebooting during umount) Message-ID: <20121025141226.GC13562@thunk.org> Mail-Followup-To: Theodore Ts'o , Nix , Eric Sandeen , linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, "J. Bruce Fields" , Bryan Schumaker , Peng Tao , Trond.Myklebust@netapp.com, gregkh@linuxfoundation.org, Toralf =?iso-8859-1?Q?F=F6rster?= References: <87pq48nbyz.fsf_-_@spindle.srvr.nix> <508740B2.2030401@redhat.com> <87txtkld4h.fsf@spindle.srvr.nix> <50876E1D.3040501@redhat.com> <20121024052351.GB21714@thunk.org> <878vavveee.fsf@spindle.srvr.nix> <20121024210819.GA5484@thunk.org> <87y5iv78op.fsf_-_@spindle.srvr.nix> <20121025011056.GC4559@thunk.org> <87y5iv5noq.fsf@spindle.srvr.nix> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87y5iv5noq.fsf@spindle.srvr.nix> User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: tytso@thunk.org X-SA-Exim-Scanned: No (on imap.thunk.org); SAEximRunCond expanded to false Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1473 Lines: 31 I've been thinking about this some more, and if you don't have a lot of time, perhaps the most important test to do is this. Does the chance of your seeing corrupted files in v3.6.3 go down if you run 3.6.3 with commit 14b4ed22a6 reverted? Keep your current configuration, using nobarrier, et. al, constant. If reverting the commit makes things better, then that's what would be most useful to know as soon as possible, since the correct short-term solution is to revert that commit for 3.7-rcX, as well as the 3.6 and 3.5 stable kernels. We can investigate later whether nobarrier, journal_async_commit seem to make the problem worse, and whether the less common corruption case that you were seeing with 3.6.1 was actually a change which was introduced between 3.3 and 3.4. But most importantly, even if the bug doesn't show up with the default mount options at all (which explains why Eric and I weren't able to reproduce it), there are probably other users using nobarrier, so if the frequency with which you were seeing corruptions went up significantly between 3.6.1 and 3.6.3, and reverting 14b4ed22a6 brings the frequency back down to what you were seeing with 3.6.1, we should do that ASAP. Regards, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/