From: Dave Chinner Subject: Re: BUG in ext4 with 2.6.37-rc1 Date: Thu, 4 Nov 2010 09:56:46 +1100 Message-ID: <20101103225646.GC9169@dastard> References: <20101102202013.GA3861@elliptictech.com> <4CD1A67D.5060909@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org To: Eric Sandeen Return-path: Content-Disposition: inline In-Reply-To: <4CD1A67D.5060909@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Wed, Nov 03, 2010 at 02:14:21PM -0400, Eric Sandeen wrote: > On 11/2/10 4:20 PM, Nick Bowler wrote: > > The following BUG occurred today while compiling gcc, with 2.6.37-rc1+. > > More precisely, commit 7fe19da4ca38 ("preempt: fix kernel build with > > !CONFIG_BKL") with http://permalink.gmane.org/gmane.linux.nfs/36521 > > applied on top. It basically took out the whole system. > > > > ------------[ cut here ]------------ > > kernel BUG at /scratch_space/linux-2.6/fs/ext4/page-io.c:146! > > 138 ext4_io_end_t *ext4_init_io_end(struct inode *inode, gfp_t flags) > 139 { > 140 ext4_io_end_t *io = NULL; > 141 > 142 io = kmem_cache_alloc(io_end_cachep, flags); > 143 if (io) { > 144 memset(io, 0, sizeof(*io)); > 145 io->inode = igrab(inode); > 146 BUG_ON(!io->inode); > > igrab can fail if it's being torn down: > > /* > * Handle the case where s_op->clear_inode is not been > * called yet, and somebody is calling igrab > * while the inode is getting freed. > */ > inode = NULL; > > and boom. Oh, nasty. FWIW, the XFS code this was copied from doesn't have this problem because the struct inode is not tagged for reclaim in ->destroy_inode until all writeback IO is completed. We keep a separate active ioend reference count in the struct xfs_inode, and the inode is never freed while there are still active IO references (see the xfs_ioend_wait() call in xfs_fs_destroy_inode). Hence the XFS ->writepage path does not need to take inode references to handle the possibility of an inode being freed from under it because the inode lifecycle model guarantees it cannot occur. Perhaps ext4 needs to copy more from XFS.... ;) BTW, io_end_cachep() probably should use a mempool (like the equivalent XFS ioend slab cache), otherwise ext4 won't be able to make writeback progress in OOM conditions and will avoid needing to handle ENOMEM errors in ->writepage. Cheers, Dave. -- Dave Chinner david@fromorbit.com