Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933076Ab0GOL0m (ORCPT ); Thu, 15 Jul 2010 07:26:42 -0400 Received: from bld-mail12.adl6.internode.on.net ([150.101.137.97]:39546 "EHLO mail.internode.on.net" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S933061Ab0GOL0l (ORCPT ); Thu, 15 Jul 2010 07:26:41 -0400 Date: Thu, 15 Jul 2010 21:26:36 +1000 From: Dave Chinner To: linux-ext4@vger.kernel.org Cc: linux-kernel@vger.kernel.org Subject: Re: 2.6.35-r5 ext3 corruptions Message-ID: <20100715112636.GJ30737@dastard> References: <20100715105745.GI30737@dastard> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20100715105745.GI30737@dastard> User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4217 Lines: 81 On Thu, Jul 15, 2010 at 08:57:45PM +1000, Dave Chinner wrote: > Upgrading my test vms from 2.6.35-rc3 to 2.6.35-rc5 is resulting in > repeated errors on the root drive of a test VM: > > { 1532.368808] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > [ 1532.370859] Aborting journal on device sda1. > [ 1532.376957] EXT3-fs (sda1): > [ 1532.376976] EXT3-fs (sda1): error: ext3_journal_start_sb: Detected aborted journal > [ 1532.376980] EXT3-fs (sda1): error: remounting filesystem read-only > [ 1532.420361] error: remounting filesystem read-only > [ 1532.621209] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 > > The filesysetm is a mess when checked on reboot - lots of illegal > references to blocks, multiply linked blocks, etc, but repairs. > Files are lots, truncated, etc, so there is visible filesystem > damage. > > I did lots of testing on 2.6.35-rc3 and came across no problems; > problems only seemed to start with 2.6.35-rc5, and I've repă‚Ťoduced > the problem on a vanilla 2.6.35-rc4. > > The problem seems to occur randomly - sometimes during boot or when > idle after boot, sometimes a while after boot. I haven't done any > digging at all for the cause - all I've done so far is confirm that > it is reproducable and it's not my code causing the problem. FWIW, a warning is trigging a few seconds after an error occurs: [ 1025.201140] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 [ 1025.203062] Aborting journal on device sda1. [ 1025.217894] EXT3-fs (sda1): error: remounting filesystem read-only [ 1025.271198] EXT3-fs error (device sda1): ext3_lookup: deleted inode referenced: 211043 [ 1039.116558] ------------[ cut here ]------------ [ 1039.117192] WARNING: at fs/ext3/inode.c:1534 ext3_ordered_writepage+0x213/0x230() [ 1039.120544] Hardware name: Bochs [ 1039.121036] Modules linked in: [last unloaded: scsi_wait_scan] [ 1039.122103] Pid: 1838, comm: flush-8:0 Not tainted 2.6.35-rc5-dgc+ #34 [ 1039.122837] Call Trace: [ 1039.123320] [] warn_slowpath_common+0x7f/0xc0 [ 1039.123892] [] warn_slowpath_null+0x1a/0x20 [ 1039.124461] [] ext3_ordered_writepage+0x213/0x230 [ 1039.125088] [] __writepage+0x1a/0x50 [ 1039.125652] [] write_cache_pages+0x1f7/0x410 [ 1039.126233] [] ? __writepage+0x0/0x50 [ 1039.126796] [] ? cpuacct_charge+0x9b/0xb0 [ 1039.127371] [] ? cpuacct_charge+0x22/0xb0 [ 1039.127947] [] ? pvclock_clocksource_read+0x58/0xd0 [ 1039.128574] [] generic_writepages+0x27/0x30 [ 1039.129146] [] do_writepages+0x35/0x40 [ 1039.129709] [] writeback_single_inode+0xe4/0x3e0 [ 1039.130290] [] writeback_sb_inodes+0x199/0x2a0 [ 1039.130869] [] writeback_inodes_wb+0x76/0x1a0 [ 1039.131444] [] wb_writeback+0x24b/0x2b0 [ 1039.132001] [] wb_do_writeback+0x17d/0x190 [ 1039.132597] [] bdi_writeback_task+0x57/0x160 [ 1039.133200] [] ? bit_waitqueue+0x17/0xc0 [ 1039.133771] [] ? bdi_start_fn+0x0/0x100 [ 1039.134327] [] bdi_start_fn+0x86/0x100 [ 1039.134876] [] ? bdi_start_fn+0x0/0x100 [ 1039.135435] [] kthread+0x96/0xa0 [ 1039.135970] [] kernel_thread_helper+0x4/0x10 [ 1039.136575] [] ? restore_args+0x0/0x30 [ 1039.137128] [] ? kthread+0x0/0xa0 [ 1039.137701] [] ? kernel_thread_helper+0x0/0x10 [ 1039.138272] ---[ end trace 689f32ae8f9a7104 ]--- Of interest is that it is the same inode number that it tripped over. It's always been inode numbers in the ~211000 range that have been reported. Cheers, Dave. -- Dave Chinner david@fromorbit.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/