From: "Darrick J. Wong" Subject: Re: 4.7.0-rc7 ext4 error in dx_probe Date: Fri, 5 Aug 2016 10:02:28 -0700 Message-ID: <20160805170228.GA19960@birch.djwong.org> References: <20160718141723.GA8809@sig21.net> <7849bcd2-142d-0a12-0a04-7d0c3b6d788f@etorok.net> <20160805103544.kbt7znbzypvi5ofx@sig21.net> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Cc: =?iso-8859-1?B?VPZy9ms=?= Edwin , linux-kernel@vger.kernel.org, tytso@mit.edu, linux-ext4@vger.kernel.org To: Johannes Stezenbach Return-path: Content-Disposition: inline In-Reply-To: <20160805103544.kbt7znbzypvi5ofx@sig21.net> Sender: linux-kernel-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org On Fri, Aug 05, 2016 at 12:35:44PM +0200, Johannes Stezenbach wrote: > On Wed, Aug 03, 2016 at 05:50:26PM +0300, T?r?k Edwin wrote: > > I have just encountered a similar problem after I've recently upgraded to 4.7.0: > > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: inode #13295: comm python: Directory index failed checksum > > [Wed Aug 3 11:08:57 2016] Aborting journal on device dm-1-8. > > [Wed Aug 3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only > > [Wed Aug 3 11:08:57 2016] EXT4-fs error (device dm-1): ext4_journal_check_start:56: Detected aborted journal > > > > I've rebooted in single-user mode, fsck fixed the filesystem, and rebooted, filesystem is rw again now. > > > > inode #13295 seems to be this and I can list it now: > > stat /usr/lib64/python3.4/site-packages > > File: '/usr/lib64/python3.4/site-packages' > > Size: 12288 Blocks: 24 IO Block: 4096 directory > > Device: fd01h/64769d Inode: 13295 Links: 180 > > Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) > > Access: 2016-05-09 11:29:44.056661988 +0300 > > Modify: 2016-08-01 00:34:24.029779875 +0300 > > Change: 2016-08-01 00:34:24.029779875 +0300 > > Birth: - > > > > The filesystem was /, I only noticed it was readonly after several hours when I tried to install something: > > /dev/mapper/vg--ssd-root on / type ext4 (rw,noatime,errors=remount-ro,data=ordered) > > > > $ uname -a > > Linux bolt 4.7.0-gentoo-rr #1 SMP Thu Jul 28 11:28:56 EEST 2016 x86_64 AMD FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux > > > > FWIW I've been using ext4 for years and this is the first time I see this message. > > Prior to 4.7 I was on 4.6.1 -> 4.6.2 -> 4.6.3 -> 4.6.4. > > > > The kernel is from gentoo-sources + a patch for enabling AMD LWP (I had that patch since 4.6.3 and its not related to I/O). > > > > If I see this message again what should I do to obtain more information to trace down the root cause? > > It just happened again to me, this time hitting /usr/sbin/ > on root fs. Meanwhile I ran memtest86 7.0 for two nights, > it didn't find anything. I'm using hibernate regularly > and I think so this only happened after a few hibernate/resume > cycles, but no idea if that means anything. > Now I'm back at 4.4.16 to see if it reproduces. When you're back on 4.7, can you apply this patch[1] to see if it fixes the problem? I speculate that the new parallel dir lookup code enables multiple threads to be verifying the same directory block buffer at the same time. --D [1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23 > > Johannes > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html