From: "Darrick J. Wong" Subject: Re: e2fsck not fixing deleted inode referenced errors? Date: Tue, 30 Sep 2014 12:29:55 -0700 Message-ID: <20140930192955.GB9942@birch.djwong.org> References: <542AEED4.5050303@bitsync.net> <20140930183012.GA9942@birch.djwong.org> <542AF9B8.2090800@bitsync.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org To: Zlatko Calusic Return-path: Received: from userp1040.oracle.com ([156.151.31.81]:51644 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750954AbaI3TaG (ORCPT ); Tue, 30 Sep 2014 15:30:06 -0400 Content-Disposition: inline In-Reply-To: <542AF9B8.2090800@bitsync.net> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Tue, Sep 30, 2014 at 08:43:04PM +0200, Zlatko Calusic wrote: > On 30.09.2014 20:30, Darrick J. Wong wrote: > >On Tue, Sep 30, 2014 at 07:56:36PM +0200, Zlatko Calusic wrote: > >>Hope this is the right list to ask this question. > >> > >>I have an ext4 filesystem that has a few errors like this: > >> > >>Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): > >>ext4_lookup:1448: inode #7913865: comm find: deleted inode > >>referenced: 7912058 > >>Sep 30 19:14:09 atlas kernel: EXT4-fs error (device md2): > >>ext4_lookup:1448: inode #7913865: comm find: deleted inode > >>referenced: 7912055 > >> > >>Yet, when I run e2fsck -fy on it, I have a clean run, no errors are > >>found and/or fixed. Is this the expected behaviour? What am I > >>supposed to do to get rid of errors like the above? > > > >[I should hope not.] > > > >>The filesystem is on a md mirror device, the kernel is 3.17.0-rc7, > >>e2progs 1.42.12-1 (Debian sid). Could md device somehow interfere? I > >>ran md check yesterday, but there were no errors. > >> > >>BTW, this all started when I got ata2.00: failed command: FLUSH > >>CACHE EXT error yesterday morning. I did several runs of e2fsck > >>before the filesystem came up clean, yet errors like the above are > >>popping constantly. > > > >Normally that kernel message only happens if a dir refers to an inode with > >link_count and mode set to 0. > > > >Is the disk attached to ata2.00 one of the RAID1 mirrors? What was the full > >error message, and does smartctl -a report anything? > > Yes, it is part of the mirror: > > ata2.00: ATA-8: WDC WD1002FBYS-02A6B0, 03.00C06, max UDMA/133 > ata2.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA > ata2.00: configured for UDMA/133 > > md2 : active raid1 sdb2[0] sda2[1] > 976229760 blocks [2/2] [UU] > bitmap: 0/8 pages [0KB], 65536KB chunk > > Full error message from the kernel log, together with data check I > did in the evening: > > Sep 29 05:07:51 atlas kernel: ata2.00: exception Emask 0x10 SAct 0x0 > SErr 0x4010000 action 0xe frozen > Sep 29 05:07:51 atlas kernel: ata2.00: irq_stat 0x00400040, > connection status changed > Sep 29 05:07:51 atlas kernel: ata2: SError: { PHYRdyChg DevExch } > Sep 29 05:07:51 atlas kernel: ata2.00: failed command: FLUSH CACHE EXT > Sep 29 05:07:51 atlas kernel: ata2.00: cmd > ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0\x0a res > 40/00:f4:e2:7f:14/00:00:3a:00:00/40 Emask 0x10 (ATA bus error) > Sep 29 05:07:51 atlas kernel: ata2.00: status: { DRDY } > Sep 29 05:07:51 atlas kernel: ata2: hard resetting link > Sep 29 05:07:57 atlas kernel: ata2: link is slow to respond, please > be patient (ready=0) > Sep 29 05:08:00 atlas kernel: ata2: SATA link up 3.0 Gbps (SStatus > 123 SControl 300) > Sep 29 05:08:00 atlas kernel: ata2.00: configured for UDMA/133 > Sep 29 05:08:00 atlas kernel: ata2.00: retrying FLUSH 0xea Emask 0x10 > Sep 29 05:08:00 atlas kernel: ata2: EH complete > Sep 29 05:37:36 atlas kernel: EXT4-fs error (device md2): > ext4_mb_generate_buddy:757: group 1783, block bitmap and bg > descriptor inconsistent: 8218 vs 9292 free clusters > Sep 29 05:37:36 atlas kernel: JBD2: Spotted dirty metadata buffer > (dev = md2, blocknr = 0). There's a risk of filesystem corruption in > case of system crash. > Sep 29 16:03:43 atlas kernel: EXT4-fs error (device md2): > ext4_mb_generate_buddy:757: group 995, block bitmap and bg > descriptor inconsistent: 15932 vs 15939 free clusters > Sep 29 16:03:43 atlas kernel: EXT4-fs error (device md2): > ext4_mb_generate_buddy:757: group 1732, block bitmap and bg > descriptor inconsistent: 5055 vs 5705 free clusters > Sep 29 19:24:01 atlas kernel: md: data-check of RAID array md2 > Sep 29 19:24:01 atlas kernel: md: minimum _guaranteed_ speed: 1000 > KB/sec/disk. > Sep 29 19:24:01 atlas kernel: md: using maximum available idle IO > bandwidth (but not more than 200000 KB/sec) for data-check. > Sep 29 19:24:01 atlas kernel: md: using 128k window, over a total of > 976229760k. > Sep 29 22:37:53 atlas kernel: md: md2: data-check done. > > > Later on I did several (at least 3) e2fsck runs until the filesystem > finally was clean of errors. Only to stumble upon new errors today > that can't be fixed with e2fsck anymore. :( > > > > >It would be interesting to see what "debugfs -R 'stat <7912058>' /dev/md2" > >returns. > > Inode: 7912058 Type: regular Mode: 0644 Flags: 0x80000 > Generation: 252726504 Version: 0x00000000:00000001 > User: 0 Group: 0 Size: 0 > File ACL: 0 Directory ACL: 0 > Links: 0 Blockcount: 0 > Fragment: Address: 0 Number: 0 Size: 0 > ctime: 0x5428ccf9:667449f0 -- Mon Sep 29 05:07:37 2014 > atime: 0x5428ccf9:65fa3740 -- Mon Sep 29 05:07:37 2014 > mtime: 0x5428ccf9:667449f0 -- Mon Sep 29 05:07:37 2014 > crtime: 0x53451666:d35246b0 -- Wed Apr 9 11:44:06 2014 > dtime: 0x5428ccf9 -- Mon Sep 29 05:07:37 2014 > Size of extra inode fields: 28 > EXTENTS: Huh. This looks like a normal deleted file... just to ensure we're sane, what's the output of: debugfs -R 'ls <7913865>' /dev/md2 debugfs -R 'ncheck 7913865' /dev/md2 Hoping 7913865 -> /ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters > At this time there seems to be 7 such files. Here's what it looks like: > > {atlas} [/ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters]# ls -la > ls: cannot access colormod.so: Input/output error > ls: cannot access bumpmap.so: Input/output error > ls: cannot access bumpmap.la: Input/output error > ls: cannot access testfilter.la: Input/output error > ls: cannot access testfilter.so: Input/output error > ls: cannot access colormod.la: Input/output error > total 8 > drwxr-xr-x 2 root root 4096 Sep 28 11:10 . > drwxr-xr-x 4 root root 4096 Sep 14 2013 .. > -????????? ? ? ? ? ? bumpmap.la > -????????? ? ? ? ? ? bumpmap.so > -????????? ? ? ? ? ? colormod.la > -????????? ? ? ? ? ? colormod.so > -????????? ? ? ? ? ? testfilter.la > -????????? ? ? ? ? ? testfilter.so > {atlas} [/ext/backup/atlas/usr/lib/x86_64-linux-gnu/imlib2/filters]# cd > {atlas} [~]# umount /ext > tim{atlas} [~]# time e2fsck -fy /dev/md2 > e2fsck 1.42.12 (29-Aug-2014) > Pass 1: Checking inodes, blocks, and sizes > Pass 2: Checking directory structure > Pass 3: Checking directory connectivity > Pass 4: Checking reference counts > Pass 5: Checking group summary information > /dev/md2: 3863428/61022208 files (0.7% non-contiguous), > 231256220/244057440 blocks > e2fsck -fy /dev/md2 9.57s user 2.05s system 5% cpu 3:14.40 total By any chance did you save the e2fsck logs? Digging through the e2fsck source code, the only way an inode gets marked used is if i_link_count > 0 or ... badblocks thinks the inode table block is bad. What does this say? debugfs -R 'stat <1>' /dev/md2 > Tried to delete that directory - impossible, i/o errors. I'll try to > reboot now to see if anything changes... In theory we can use debugfs to clear the directory and then run e2fsck to clean up, but let's sanity-check the world before we resort to that. :) --D > > Thanks for your help. > -- > Zlatko > > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html