From: Theodore Tso Subject: Re: Fw: 2.6.28.9: EXT3/NFS inodes corruption Date: Wed, 22 Apr 2009 18:44:55 -0400 Message-ID: <20090422224455.GV15541@mit.edu> References: <20090422142424.b4105f4c.akpm@linux-foundation.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org, linux-nfs@vger.kernel.org, Sylvain Rochet To: Andrew Morton Return-path: Received: from THUNK.ORG ([69.25.196.29]:45277 "EHLO thunker.thunk.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750704AbZDVWpE (ORCPT ); Wed, 22 Apr 2009 18:45:04 -0400 Content-Disposition: inline In-Reply-To: <20090422142424.b4105f4c.akpm@linux-foundation.org> Sender: linux-ext4-owner@vger.kernel.org List-ID: On Wed, Apr 22, 2009 at 02:24:24PM -0700, Andrew Morton wrote: > > Is it nfsd, or is it htree? Well, I see evidence in the bug report of corrupted directory data structures, so I don't think it's an NFS problem. I would want to rule out hardware flakiness, though. This could easily be caused by a hardware problem. > The kernel log is not really nice with us, here on the NFS Server: > > Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52 > Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended. Evidence of a corrupted directory entry. We would need to look at the directory to see whether the directory just ad a few bits flipped, or is pure garbage. The ext3 htree code should do a better job printing out diagnostics, and flagging the filesystem as corrupt here. > Apr 2 22:19:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40491685), 0 More evidence of a corrupted directory. > == Going deeper into the problem > > Something like that is quite common: > > root@bazooka:/data/...# ls -la > total xxx > drwxrwx--- 2 xx xx 4096 2009-04-20 03:48 . > drwxr-xr-x 7 root root 4096 2007-01-21 13:15 .. > -rw-r--r-- 1 root root 0 2009-04-20 03:48 access.log > -rw-r--r-- 1 root root 70784145 2009-04-20 00:11 access.log.0 > -rw-r--r-- 1 root root 6347007 2009-04-10 00:07 access.log.10.gz > -rw-r--r-- 1 root root 6866097 2009-04-09 00:08 access.log.11.gz > -rw-r--r-- 1 root root 6410119 2009-04-08 00:07 access.log.12.gz > -rw-r--r-- 1 root root 6488274 2009-04-07 00:08 access.log.13.gz > ?--------- ? ? ? ? ? access.log.14.gz > ?--------- ? ? ? ? ? access.log.15.gz > ?--------- ? ? ? ? ? access.log.16.gz This is on the client side; what happens when you look at the same directory from the server side? > > fsck.ext3 fixed the filesystem but didn't fix the problem. > What do you mean by that? That subsequently, you started seeing filesystem corruptions again? Can you send me the output of fsck.ext3? The sorts of filesystem corruption problems which are fixed by e2fsck are important in figuring out what is going on. What you if you run fsck.ext3 (aka e2fsck) twice. Once after fixing fixing all of the problems, and then a second time afterwards. Do the problems stay fixed? Suppose you try mounting the filesystem read-only; are things stable while it is mounted read-only. > Let's check how inodes numbers are distributed: > > # cat /root/inodesnumbers | perl -e 'use Data::Dumper; my @pof; while(<>){my ( $inode ) = ( $_ =~ /^(\d+)/ ); my $hop = int($inode/1000000); $pof[$hop]++; }; for (0 .. $#pof) { print $_." = ".($pof[$_]/10000)."%\n" }' > [... lot of quite unused inodes groups] > 53 = 3.0371% > 54 = 26.679% <= mailboxes > 55 = 2.7026% > [... lot of quite unused inodes groups] > 58 = 1.3262% > 59 = 27.3211% <= mailing lists archives > 60 = 5.5159% > [... lot of quite unused inodes groups] > 171 = 0.0631% > 172 = 0.1063% > 173 = 27.2895% <= > 174 = 44.0623% <= > 175 = 45.6783% <= websites files > 176 = 45.8247% <= > 177 = 36.9376% <= > 178 = 6.3294% > 179 = 0.0442% Yes, that's normal. BTW, you can get this sort of information much more easily simply by using the "dumpe2fs" program. > We use to fix broken folders by moving them to a quarantine folder and > by restoring disappeared files from the backup. > > So, let's check corrupted inodes number from the quarantine folder: > > root@bazooka:/data/path/to/rep/of/quarantine/folders# find . -mindepth 1 -maxdepth 1 -printf '%i\n' | sort -n > 174293418 > 174506030 > 174506056 > 174506073 > 174506081 > 174506733 > 174507694 > 174507708 > 174507888 > 174507985 > 174508077 > 174508083 > 176473056 > 176473062 > 176473064 > > Humm... those are quite near to each other 17450... 17647... and are of > course in the most used inodes "groups"... When you say "corrupted inodes", how are they corrupted? The errors you showed on the server side looked like directory corruptions. Were these inodes directories or data files? This really smells like a hardware problem to me; my recommendation would be to run memory tests and also hard drive tests. I'm going to guess it's more likely the problem is with your hard drives as opposed to memory --- that would be consistent with your observation that trying to keep the inodes in memory seems to help. - Ted