Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1764381AbZAWBcy (ORCPT ); Thu, 22 Jan 2009 20:32:54 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1762317AbZAWBV7 (ORCPT ); Thu, 22 Jan 2009 20:21:59 -0500 Received: from ipmail01.adl6.internode.on.net ([203.16.214.146]:63684 "EHLO ipmail01.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1762309AbZAWBV6 (ORCPT ); Thu, 22 Jan 2009 20:21:58 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ai4EAOKpeEl5LAUpgWdsb2JhbACTegEBFiK2U4Vz X-IronPort-AV: E=Sophos;i="4.37,309,1231075800"; d="scan'208";a="273453583" Date: Fri, 23 Jan 2009 12:10:42 +1100 From: Dave Chinner To: Eric Sesterhenn , Christoph Hellwig , Nick Piggin , Pavel Machek , Chris Mason , linux-kernel@vger.kernel.org, npiggin@yahoo.com.au, xfs@oss.sgi.com Subject: Re: [PATCH] Re: Corrupted XFS log replay oops. Message-ID: <20090123011042.GD32390@disturbed> Mail-Followup-To: Eric Sesterhenn , Christoph Hellwig , Nick Piggin , Pavel Machek , Chris Mason , linux-kernel@vger.kernel.org, npiggin@yahoo.com.au, xfs@oss.sgi.com References: <20090113142147.GE16333@alice> <20090120173455.GC21339@alice> <20090121035703.GH10158@disturbed> <200901211503.07308.nickpiggin@yahoo.com.au> <20090122043747.GU10158@disturbed> <20090122061158.GA31104@infradead.org> <20090122100648.GA16660@alice> <20090122233717.GB32390@disturbed> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090122233717.GB32390@disturbed> User-Agent: Mutt/1.5.18 (2008-05-17) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1854 Lines: 53 On Fri, Jan 23, 2009 at 10:37:17AM +1100, Dave Chinner wrote: > On Thu, Jan 22, 2009 at 11:06:48AM +0100, Eric Sesterhenn wrote: > > * Christoph Hellwig (hch@infradead.org) wrote: > > > On Thu, Jan 22, 2009 at 03:37:47PM +1100, Dave Chinner wrote: > > > > xfs_buf_t * > > > > xlog_get_bp( > > > > xlog_t *log, > > > > - int num_bblks) > > > > + int nbblks) > > > > > > Any reason for reanming this variable? That causes quite a bit of > > > churn. > > > > > > > { > > > > - ASSERT(num_bblks > 0); > > > > + if (nbblks <= 0 || nbblks > log->l_logBBsize) { > > > > + xlog_warn("XFS: Invalid block length (0x%x) given for buffer", nbblks); > > > > > > And doesn't prevent this line from needing a linebreak to stay under 80 > > > characters :) > > > > > > Except for these nitpicks it looks fine to me. > > > > Using the image at http://www.cccmz.de/~snakebyte/xfs.254.img.bz2 > > I was able to produce a pretty similar error with the patch applied > > Different problem, obviously. ;) > > I'll have a look at this later today.... One word: Ouch. Basically the corruption introduced adds random feature bits into the superblock that aren't actually in use. And hence instead of having valid superblock fields for each of those features, they are zero and so strange stuff happens. What is really stupid is that the fields are often checked. By ASSERT(), not by production code so a debug kernel will pick up the problem and panic, while a production kernel will continue onwards until it panics. This is not going to be a small patch..... Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/