LinuxLists.cc - 2.6.28.9: EXT3/NFS inodes corruption

2009-04-20 16:26:13

Subject: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

We(TuxFamily) are having some inodes corruptions on a NFS server.

So, let's start with the facts.

==== NFS Server

Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux

root@bazooka:/usr/src# grep EXT3 /boot/config-2.6.28.9
CONFIG_EXT3_FS=y
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
CONFIG_EXT3_FS_SECURITY=y

root@bazooka:/usr/src# grep NFS /boot/config-2.6.28.9
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
# CONFIG_NFS_V4 is not set
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
# CONFIG_NFSD_V4 is not set
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y

==> We upgraded from 2.6.26.5 to 2.6.28.9, problem's still here

/dev/md10 on /data type ext3 (rw,noatime,nodiratime,grpquota,commit=5,data=ordered)

==> We used data=writeback, we fallback to data=ordered,
problem's still here

# /etc/exports
/data *(rw,no_root_squash,async,no_subtree_check)
... and lots of exports of subdirs of /data, exported the same way

Process about NFS, on the NFS server.

root@bazooka:~# ps aux | grep -E '(nfsd]|lockd]|statd|mountd|idmapd|rquotad|portmap)$'
daemon 1226 0.0 0.0 4824 452 ? Ss Apr11 0:06 /sbin/portmap
root 1703 0.0 0.0 0 0 ? S< 01:29 0:09 [lockd]
root 1704 0.3 0.0 0 0 ? D< 01:29 3:29 [nfsd]
root 1705 0.3 0.0 0 0 ? S< 01:29 3:34 [nfsd]
root 1706 0.3 0.0 0 0 ? S< 01:29 3:32 [nfsd]
root 1707 0.3 0.0 0 0 ? S< 01:29 3:30 [nfsd]
root 1708 0.3 0.0 0 0 ? D< 01:29 3:43 [nfsd]
root 1709 0.3 0.0 0 0 ? D< 01:29 3:43 [nfsd]
root 1710 0.3 0.0 0 0 ? D< 01:29 3:39 [nfsd]
root 1711 0.3 0.0 0 0 ? D< 01:29 3:42 [nfsd]
root 1715 0.0 0.0 5980 576 ? Ss 01:29 0:00 /usr/sbin/rpc.mountd
statd 1770 0.0 0.0 8072 648 ? Ss Apr11 0:00 /sbin/rpc.statd
root 1776 0.0 0.0 23180 536 ? Ss Apr11 0:00 /usr/sbin/rpc.idmapd
root 1785 0.0 0.0 6148 552 ? Ss Apr11 0:00 /usr/sbin/rpc.rquotad

==> We used to run tenths of nfsd daemons, we fallback to 8,
the default, problem's still here
==> There are some 'D' processes because of a running data-check

Block device health:

Apr 3 00:28:20 bazooka kernel: md: data-check of RAID array md10
Apr 3 05:11:59 bazooka kernel: md: md10: data-check done.

Apr 5 01:06:01 bazooka kernel: md: data-check of RAID array md10
Apr 5 05:49:42 bazooka kernel: md: md10: data-check done.

Apr 20 16:27:33 bazooka kernel: md: data-check of RAID array md10

md10 : active raid6 sda[0] sdl[11] sdk[10] sdj[9] sdi[8] sdh[7] sdg[6] sdf[5] sde[4] sdd[3] sdc[2] sdb[1]
1433738880 blocks level 6, 64k chunk, algorithm 2 [12/12] [UUUUUUUUUUUU]
[======>..............] check = 30.1% (43176832/143373888) finish=208.1min speed=8020K/sec

==> Everything seems fine

# df -m
/dev/md10 1378166 87170 1290997 7% /data

# df -i
/dev/md10 179224576 3454822 175769754 2% /data

==== NFS Clients

6x Linux cognac 2.6.28.9-grsec #1 SMP Sun Apr 12 13:06:49 CEST 2009 i686 GNU/Linux
5x Linux martini 2.6.28.9-grsec #1 SMP Tue Apr 14 00:01:30 UTC 2009 i686 GNU/Linux
2x Linux armagnac 2.6.28.9 #1 SMP Tue Apr 14 08:59:12 CEST 2009 i686 GNU/Linux

grad@armagnac:~$ grep NFS /boot/config-2.6.28.9
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
CONFIG_NFS_V3_ACL=y
# CONFIG_NFS_V4 is not set
CONFIG_NFSD=y
CONFIG_NFSD_V2_ACL=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_V3_ACL=y
# CONFIG_NFSD_V4 is not set
CONFIG_NFS_ACL_SUPPORT=y
CONFIG_NFS_COMMON=y

==> We upgraded from 2.6.23.16 and 2.6.24.2 (yeah, vmsplice upgrade
;-) to 2.6.28.9, problem's still here

x.x.x.x:/data/... on /data/... type nfs (rw,noexec,nosuid,nodev,async,hard,nfsvers=3,udp,intr,rsize=32768,wsize=32768,timeo=20,addr=x.x.x.x)

==> All NFS exports are mounted this way, sometimes with the 'sync'
option, like web sessions.
==> Those are often mounted from outside of chroots into chroots,
useless detail I think

Process about NFS, on the NFS clients.

root@cognac:~# ps aux | grep -E '(nfsd]|lockd]|statd|mountd|idmapd|rquotad|portmap)$'
daemon 349 0.0 0.0 1904 536 ? Ss Apr12 0:00 /sbin/portmap
statd 360 0.0 0.1 3452 1152 ? Ss Apr12 0:00 /sbin/rpc.statd
root 1190 0.0 0.0 0 0 ? S< Apr12 0:00 [lockd]

==== So, now, going into the problem

The kernel log is not really nice with us, here on the NFS Server:

Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
And so on...

And more recently...
Apr 2 22:19:01 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40780223), 0
Apr 2 22:19:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40491685), 0
Apr 11 07:23:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (174301379), 0
Apr 20 08:13:32 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (54942021), 0

Not much stuff in the kernel log of NFS clients, history is quite lost,
but we got some of them:

....................: NFS: Buggy server - nlink == 0!

== Going deeper into the problem

Something like that is quite common:

root@bazooka:/data/...# ls -la
total xxx
drwxrwx--- 2 xx xx 4096 2009-04-20 03:48 .
drwxr-xr-x 7 root root 4096 2007-01-21 13:15 ..
-rw-r--r-- 1 root root 0 2009-04-20 03:48 access.log
-rw-r--r-- 1 root root 70784145 2009-04-20 00:11 access.log.0
-rw-r--r-- 1 root root 6347007 2009-04-10 00:07 access.log.10.gz
-rw-r--r-- 1 root root 6866097 2009-04-09 00:08 access.log.11.gz
-rw-r--r-- 1 root root 6410119 2009-04-08 00:07 access.log.12.gz
-rw-r--r-- 1 root root 6488274 2009-04-07 00:08 access.log.13.gz
?--------- ? ? ? ? ? access.log.14.gz
?--------- ? ? ? ? ? access.log.15.gz
?--------- ? ? ? ? ? access.log.16.gz
?--------- ? ? ? ? ? access.log.17.gz
-rw-r--r-- 1 root root 6950626 2009-04-02 00:07 access.log.18.gz
?--------- ? ? ? ? ? access.log.19.gz
-rw-r--r-- 1 root root 6635884 2009-04-19 00:11 access.log.1.gz
?--------- ? ? ? ? ? access.log.20.gz
?--------- ? ? ? ? ? access.log.21.gz
?--------- ? ? ? ? ? access.log.22.gz
?--------- ? ? ? ? ? access.log.23.gz
?--------- ? ? ? ? ? access.log.24.gz
?--------- ? ? ? ? ? access.log.25.gz
?--------- ? ? ? ? ? access.log.26.gz
-rw-r--r-- 1 root root 6616546 2009-03-24 00:07 access.log.27.gz
?--------- ? ? ? ? ? access.log.28.gz
?--------- ? ? ? ? ? access.log.29.gz
-rw-r--r-- 1 root root 6671875 2009-04-18 00:12 access.log.2.gz
?--------- ? ? ? ? ? access.log.30.gz
-rw-r--r-- 1 root root 6347518 2009-04-17 00:10 access.log.3.gz
-rw-r--r-- 1 root root 6569714 2009-04-16 00:12 access.log.4.gz
-rw-r--r-- 1 root root 7170750 2009-04-15 00:11 access.log.5.gz
-rw-r--r-- 1 root root 6676518 2009-04-14 00:12 access.log.6.gz
-rw-r--r-- 1 root root 6167458 2009-04-13 00:11 access.log.7.gz
-rw-r--r-- 1 root root 5856576 2009-04-12 00:10 access.log.8.gz
-rw-r--r-- 1 root root 6644142 2009-04-11 00:07 access.log.9.gz

root@bazooka:/data/...# cat * # output filtered, only errors
cat: access.log.14.gz: Stale NFS file handle
cat: access.log.15.gz: Stale NFS file handle
cat: access.log.16.gz: Stale NFS file handle
cat: access.log.17.gz: Stale NFS file handle
cat: access.log.19.gz: Stale NFS file handle
cat: access.log.20.gz: Stale NFS file handle
cat: access.log.21.gz: Stale NFS file handle
cat: access.log.22.gz: Stale NFS file handle
cat: access.log.23.gz: Stale NFS file handle
cat: access.log.24.gz: Stale NFS file handle
cat: access.log.25.gz: Stale NFS file handle
cat: access.log.26.gz: Stale NFS file handle
cat: access.log.28.gz: Stale NFS file handle
cat: access.log.29.gz: Stale NFS file handle
cat: access.log.30.gz: Stale NFS file handle

"Stale NFS file handle"... on the NFS Server... hummm...

== Other facts

fsck.ext3 fixed the filesystem but didn't fix the problem.

mkfs.ext3 didn't fix the problem either.

It only concerns files which have been recently modified, logs, awstats
hashfiles, websites caches, sessions, locks, and such.

It mainly happens to files which are created on the NFS server itself,
but it's not a hard rule.

Keeping inodes into servers' cache seems to prevent the problem to happen.
( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )

Hummm, it seems to concern files which are quite near to each others,
let's check that:

Let's build up an inode "database"

# find /data -printf '%i %p\n' > /root/inodesnumbers

Let's check how inodes numbers are distributed:

# cat /root/inodesnumbers | perl -e 'use Data::Dumper; my @pof; while(<>){my ( $inode ) = ( $_ =~ /^(\d+)/ ); my $hop = int($inode/1000000); $pof[$hop]++; }; for (0 .. $#pof) { print $_." = ".($pof[$_]/10000)."%\n" }'
[... lot of quite unused inodes groups]
53 = 3.0371%
54 = 26.679% <= mailboxes
55 = 2.7026%
[... lot of quite unused inodes groups]
58 = 1.3262%
59 = 27.3211% <= mailing lists archives
60 = 5.5159%
[... lot of quite unused inodes groups]
171 = 0.0631%
172 = 0.1063%
173 = 27.2895% <=
174 = 44.0623% <=
175 = 45.6783% <= websites files
176 = 45.8247% <=
177 = 36.9376% <=
178 = 6.3294%
179 = 0.0442%

Hummm, all the files are using the same inodes "groups".
(groups of a million of inodes)

We use to fix broken folders by moving them to a quarantine folder and
by restoring disappeared files from the backup.

So, let's check corrupted inodes number from the quarantine folder:

root@bazooka:/data/path/to/rep/of/quarantine/folders# find . -mindepth 1 -maxdepth 1 -printf '%i\n' | sort -n
174293418
174506030
174506056
174506073
174506081
174506733
174507694
174507708
174507888
174507985
174508077
174508083
176473056
176473062
176473064

Humm... those are quite near to each other 17450... 17647... and are of
course in the most used inodes "groups"...

Open question: are NFS clients can steal inodes numbers from each others ?

I am not sure whether my bug report is good, feel free to ask questions ;)

Best regards,
Sylvain

Attachments:

(No filename) (11.47 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2009-07-16 17:27:51

by Jan Kara

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

> We(TuxFamily) are having some inodes corruptions on a NFS server.
>
> So, let's start with the facts.
>
>
> ==== NFS Server
>
> Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
Can you still see the corruption with 2.6.30 kernel?

...
> /dev/md10 on /data type ext3 (rw,noatime,nodiratime,grpquota,commit=5,data=ordered)
>
> ==> We used data=writeback, we fallback to data=ordered,
> problem's still here
>
...
>
> # df -m
> /dev/md10 1378166 87170 1290997 7% /data
1.3 TB, a large filesystem ;).

> # df -i
> /dev/md10 179224576 3454822 175769754 2% /data
>
>
>
> ==== NFS Clients
>
> 6x Linux cognac 2.6.28.9-grsec #1 SMP Sun Apr 12 13:06:49 CEST 2009 i686 GNU/Linux
> 5x Linux martini 2.6.28.9-grsec #1 SMP Tue Apr 14 00:01:30 UTC 2009 i686 GNU/Linux
> 2x Linux armagnac 2.6.28.9 #1 SMP Tue Apr 14 08:59:12 CEST 2009 i686 GNU/Linux
>
> x.x.x.x:/data/... on /data/... type nfs (rw,noexec,nosuid,nodev,async,hard,nfsvers=3,udp,intr,rsize=32768,wsize=32768,timeo=20,addr=x.x.x.x)
>
> ==> All NFS exports are mounted this way, sometimes with the 'sync'
> option, like web sessions.
> ==> Those are often mounted from outside of chroots into chroots,
> useless detail I think
...

> ==== So, now, going into the problem
>
> The kernel log is not really nice with us, here on the NFS Server:
>
> Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:14 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:16 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Corrupt dir inode 40420228, running e2fsck is recommended.
> Mar 22 06:47:19 bazooka kernel: EXT3-fs warning (device md10): dx_probe: Unrecognised inode hash code 52
> And so on...
If you can still see this problem, could you run: debugfs /dev/md10
and send output of the command:
stat <40420228>
(or whatever the corrupted inode number will be)
and also:
dump <40420228> /tmp/corrupted_dir

> And more recently...
> Apr 2 22:19:01 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40780223), 0
> Apr 2 22:19:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (40491685), 0
> Apr 11 07:23:02 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (174301379), 0
> Apr 20 08:13:32 bazooka kernel: EXT3-fs warning (device md10): ext3_unlink: Deleting nonexistent file (54942021), 0
>
>
> Not much stuff in the kernel log of NFS clients, history is quite lost,
> but we got some of them:
>
> ....................: NFS: Buggy server - nlink == 0!
>
>
> == Going deeper into the problem
>
> Something like that is quite common:
>
> root@bazooka:/data/...# ls -la
> total xxx
> drwxrwx--- 2 xx xx 4096 2009-04-20 03:48 .
> drwxr-xr-x 7 root root 4096 2007-01-21 13:15 ..
> -rw-r--r-- 1 root root 0 2009-04-20 03:48 access.log
> -rw-r--r-- 1 root root 70784145 2009-04-20 00:11 access.log.0
> -rw-r--r-- 1 root root 6347007 2009-04-10 00:07 access.log.10.gz
> -rw-r--r-- 1 root root 6866097 2009-04-09 00:08 access.log.11.gz
> -rw-r--r-- 1 root root 6410119 2009-04-08 00:07 access.log.12.gz
> -rw-r--r-- 1 root root 6488274 2009-04-07 00:08 access.log.13.gz
> ?--------- ? ? ? ? ? access.log.14.gz
> ?--------- ? ? ? ? ? access.log.15.gz
> ?--------- ? ? ? ? ? access.log.16.gz
> ?--------- ? ? ? ? ? access.log.17.gz
> -rw-r--r-- 1 root root 6950626 2009-04-02 00:07 access.log.18.gz
> ?--------- ? ? ? ? ? access.log.19.gz
> -rw-r--r-- 1 root root 6635884 2009-04-19 00:11 access.log.1.gz
> ?--------- ? ? ? ? ? access.log.20.gz
> ?--------- ? ? ? ? ? access.log.21.gz
> ?--------- ? ? ? ? ? access.log.22.gz
> ?--------- ? ? ? ? ? access.log.23.gz
> ?--------- ? ? ? ? ? access.log.24.gz
> ?--------- ? ? ? ? ? access.log.25.gz
> ?--------- ? ? ? ? ? access.log.26.gz
> -rw-r--r-- 1 root root 6616546 2009-03-24 00:07 access.log.27.gz
> ?--------- ? ? ? ? ? access.log.28.gz
> ?--------- ? ? ? ? ? access.log.29.gz
> -rw-r--r-- 1 root root 6671875 2009-04-18 00:12 access.log.2.gz
> ?--------- ? ? ? ? ? access.log.30.gz
> -rw-r--r-- 1 root root 6347518 2009-04-17 00:10 access.log.3.gz
> -rw-r--r-- 1 root root 6569714 2009-04-16 00:12 access.log.4.gz
> -rw-r--r-- 1 root root 7170750 2009-04-15 00:11 access.log.5.gz
> -rw-r--r-- 1 root root 6676518 2009-04-14 00:12 access.log.6.gz
> -rw-r--r-- 1 root root 6167458 2009-04-13 00:11 access.log.7.gz
> -rw-r--r-- 1 root root 5856576 2009-04-12 00:10 access.log.8.gz
> -rw-r--r-- 1 root root 6644142 2009-04-11 00:07 access.log.9.gz
>
>
> root@bazooka:/data/...# cat * # output filtered, only errors
> cat: access.log.14.gz: Stale NFS file handle
> cat: access.log.15.gz: Stale NFS file handle
> cat: access.log.16.gz: Stale NFS file handle
> cat: access.log.17.gz: Stale NFS file handle
> cat: access.log.19.gz: Stale NFS file handle
> cat: access.log.20.gz: Stale NFS file handle
> cat: access.log.21.gz: Stale NFS file handle
> cat: access.log.22.gz: Stale NFS file handle
> cat: access.log.23.gz: Stale NFS file handle
> cat: access.log.24.gz: Stale NFS file handle
> cat: access.log.25.gz: Stale NFS file handle
> cat: access.log.26.gz: Stale NFS file handle
> cat: access.log.28.gz: Stale NFS file handle
> cat: access.log.29.gz: Stale NFS file handle
> cat: access.log.30.gz: Stale NFS file handle
>
>
> "Stale NFS file handle"... on the NFS Server... hummm...
>
>
> == Other facts
>
> fsck.ext3 fixed the filesystem but didn't fix the problem.
>
> mkfs.ext3 didn't fix the problem either.
You might want to try disabling the DIR_INDEX feature and see whether
the corruption still occurs...

> It only concerns files which have been recently modified, logs, awstats
> hashfiles, websites caches, sessions, locks, and such.
>
> It mainly happens to files which are created on the NFS server itself,
> but it's not a hard rule.
>
> Keeping inodes into servers' cache seems to prevent the problem to happen.
> ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
I'd guess just because they don't have to be read from disk where they
get corrupted.

> Hummm, it seems to concern files which are quite near to each others,
> let's check that:
>
> Let's build up an inode "database"
>
> # find /data -printf '%i %p\n' > /root/inodesnumbers
>
>
> Let's check how inodes numbers are distributed:
>
> # cat /root/inodesnumbers | perl -e 'use Data::Dumper; my @pof; while(<>){my ( $inode ) = ( $_ =~ /^(\d+)/ ); my $hop = int($inode/1000000); $pof[$hop]++; }; for (0 .. $#pof) { print $_." = ".($pof[$_]/10000)."%\n" }'
> [... lot of quite unused inodes groups]
> 53 = 3.0371%
> 54 = 26.679% <= mailboxes
> 55 = 2.7026%
> [... lot of quite unused inodes groups]
> 58 = 1.3262%
> 59 = 27.3211% <= mailing lists archives
> 60 = 5.5159%
> [... lot of quite unused inodes groups]
> 171 = 0.0631%
> 172 = 0.1063%
> 173 = 27.2895% <=
> 174 = 44.0623% <=
> 175 = 45.6783% <= websites files
> 176 = 45.8247% <=
> 177 = 36.9376% <=
> 178 = 6.3294%
> 179 = 0.0442%
>
> Hummm, all the files are using the same inodes "groups".
> (groups of a million of inodes)
Interesting, but it may well be just by the way how these files get
created / updated.

> We use to fix broken folders by moving them to a quarantine folder and
> by restoring disappeared files from the backup.
>
> So, let's check corrupted inodes number from the quarantine folder:
>
> root@bazooka:/data/path/to/rep/of/quarantine/folders# find . -mindepth 1 -maxdepth 1 -printf '%i\n' | sort -n
> 174293418
> 174506030
> 174506056
> 174506073
> 174506081
> 174506733
> 174507694
> 174507708
> 174507888
> 174507985
> 174508077
> 174508083
> 176473056
> 176473062
> 176473064
>
> Humm... those are quite near to each other 17450... 17647... and are of
> course in the most used inodes "groups"...
>
>
> Open question: are NFS clients can steal inodes numbers from each others ?
>
>
> I am not sure whether my bug report is good, feel free to ask questions ;)

Honza
--
Jan Kara <[email protected]>
SuSE CR Labs

2009-07-25 16:39:07

by Sylvain Rochet

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

Sorry for the late answer, waiting for the problem to happen again ;)

On Thu, Jul 16, 2009 at 07:27:49PM +0200, Jan Kara wrote:
> Hi,
>
> > We(TuxFamily) are having some inodes corruptions on a NFS server.
> >
> > So, let's start with the facts.
> >
> >
> > ==== NFS Server
> >
> > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
>
> Can you still see the corruption with 2.6.30 kernel?

Not upgraded yet, we'll give a try.

> If you can still see this problem, could you run: debugfs /dev/md10
> and send output of the command:
> stat <40420228>
> (or whatever the corrupted inode number will be)
> and also:
> dump <40420228> /tmp/corrupted_dir

One inode get corrupted recently, here is the output:

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
total 64
88539836 drwxr-sr-x 2 18804 23084 4096 2009-07-25 07:53 .
88539821 drwxr-sr-x 20 18804 23084 4096 2008-08-20 10:14 ..
88541578 -rw-rw-rw- 1 18804 23084 471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
88541465 -rw-rw-rw- 1 18804 23084 6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
88541471 -rw-rw-rw- 1 18804 23084 1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
88541549 -rw-rw-rw- 1 18804 23084 2813 2009-07-25 03:04 INDEX-.edfac52c
88541366 -rw-rw-rw- 1 18804 23084 0 2008-08-17 20:44 .ok
? ?--------- ? ? ? ? ? spip%3Farticle19.f8740dca
88541671 -rw-rw-rw- 1 18804 23084 5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
88541460 -rw-rw-rw- 1 18804 23084 5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
88540284 -rw-rw-rw- 1 18804 23084 3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
88541539 -rw-rw-rw- 1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
cat: spip%3Farticle19.f8740dca: Stale NFS file handle

root@bazooka:~# debugfs /dev/md10
debugfs 1.40-WIP (14-Nov-2006)

debugfs: stat <88539836>

Inode: 88539836 Type: directory Mode: 0755 Flags: 0x0 Generation: 791796957
User: 18804 Group: 23084 Size: 4096
File ACL: 0 Directory ACL: 0
Links: 2 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
Size of extra inode fields: 4
BLOCKS:
(0):177096928
TOTAL: 1

debugfs: ls <88539836>

88539836 (12) . 88539821 (32) .. 88541366 (12) .ok
88541465 (56) -inc_rss_item-32-wa.23d91cc2
88541539 (40) spip%3Fpage%3Djquery.cce608b6.gz
88540284 (40) spip%3Fpage%3Dforum-30.63b2c1b1
88541460 (28) spip%3Fmot5.f3e9adda
88541471 (160) -inc_rubriques-17-wa.f2f152f0
88541549 (24) INDEX-.edfac52c 88541578 (284) -inc_forum-10-wa.3cb1921f
88541562 (36) spip%3Farticle19.f8740dca
88541671 (3372) spip%3Fauteur1.c64f7f7e

debugfs: stat <88541562>

Inode: 88541562 Type: regular Mode: 0666 Flags: 0x0 Generation: 860068541
User: 18804 Group: 23084 Size: 0
File ACL: 0 Directory ACL: 0
Links: 0 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
Size of extra inode fields: 4
BLOCKS:

debugfs: dump <88539836> /tmp/corrupted_dir

(file attached)

> You might want to try disabling the DIR_INDEX feature and see whether
> the corruption still occurs...

We'll try.

> > Keeping inodes into servers' cache seems to prevent the problem to happen.
> > ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
>
> I'd guess just because they don't have to be read from disk where they
> get corrupted.

Exactly.

> Interesting, but it may well be just by the way how these files get
> created / updated.

Yes, this is only because of that.

Additional data that may help, we replaced the storage server to
something slower (less number of CPU, less number of cores, ...). We are
still getting some corruption but with non-common sense with the former
server.

The data are stored on two storage arrays of disks. The primary one is
made of fiber-channel disks used through a simple fiber-channel card,
RAID soft with md, raid6. The secondary one is made of SCSI disks used
through a RAID-hard card. We got corruption on both, depending on
the one currently used into production.

Sylvain

Attachments:

(No filename) (0.00 B)
signature.asc (189.00 B)
Digital signature Download all attachments

2009-07-27 15:42:56

by Jan Kara

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> Sorry for the late answer, waiting for the problem to happen again ;)
No problem.

> On Thu, Jul 16, 2009 at 07:27:49PM +0200, Jan Kara wrote:
> > Hi,
> >
> > > We(TuxFamily) are having some inodes corruptions on a NFS server.
> > >
> > > So, let's start with the facts.
> > >
> > >
> > > ==== NFS Server
> > >
> > > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
> >
> > Can you still see the corruption with 2.6.30 kernel?
> Not upgraded yet, we'll give a try.
>
> > If you can still see this problem, could you run: debugfs /dev/md10
> > and send output of the command:
> > stat <40420228>
> > (or whatever the corrupted inode number will be)
> > and also:
> > dump <40420228> /tmp/corrupted_dir
>
> One inode get corrupted recently, here is the output:
>
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> total 64
> 88539836 drwxr-sr-x 2 18804 23084 4096 2009-07-25 07:53 .
> 88539821 drwxr-sr-x 20 18804 23084 4096 2008-08-20 10:14 ..
> 88541578 -rw-rw-rw- 1 18804 23084 471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> 88541465 -rw-rw-rw- 1 18804 23084 6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> 88541471 -rw-rw-rw- 1 18804 23084 1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> 88541549 -rw-rw-rw- 1 18804 23084 2813 2009-07-25 03:04 INDEX-.edfac52c
> 88541366 -rw-rw-rw- 1 18804 23084 0 2008-08-17 20:44 .ok
> ? ?--------- ? ? ? ? ? spip%3Farticle19.f8740dca
> 88541671 -rw-rw-rw- 1 18804 23084 5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> 88541460 -rw-rw-rw- 1 18804 23084 5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> 88540284 -rw-rw-rw- 1 18804 23084 3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> 88541539 -rw-rw-rw- 1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
OK, so we couldn't stat a directory...

> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> cat: spip%3Farticle19.f8740dca: Stale NFS file handle
This is probably the misleading output from ext3_iget(). It should give
you EIO in the latest kernel.

> root@bazooka:~# debugfs /dev/md10
> debugfs 1.40-WIP (14-Nov-2006)
>
> debugfs: stat <88539836>
>
> Inode: 88539836 Type: directory Mode: 0755 Flags: 0x0 Generation: 791796957
> User: 18804 Group: 23084 Size: 4096
> File ACL: 0 Directory ACL: 0
> Links: 2 Blockcount: 8
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> Size of extra inode fields: 4
> BLOCKS:
> (0):177096928
> TOTAL: 1
>
>
> debugfs: ls <88539836>
>
> 88539836 (12) . 88539821 (32) .. 88541366 (12) .ok
> 88541465 (56) -inc_rss_item-32-wa.23d91cc2
> 88541539 (40) spip%3Fpage%3Djquery.cce608b6.gz
> 88540284 (40) spip%3Fpage%3Dforum-30.63b2c1b1
> 88541460 (28) spip%3Fmot5.f3e9adda
> 88541471 (160) -inc_rubriques-17-wa.f2f152f0
> 88541549 (24) INDEX-.edfac52c 88541578 (284) -inc_forum-10-wa.3cb1921f
> 88541562 (36) spip%3Farticle19.f8740dca
> 88541671 (3372) spip%3Fauteur1.c64f7f7e
The directory itself looks fine...

> debugfs: stat <88541562>
>
> Inode: 88541562 Type: regular Mode: 0666 Flags: 0x0 Generation: 860068541
> User: 18804 Group: 23084 Size: 0
> File ACL: 0 Directory ACL: 0
> Links: 0 Blockcount: 0
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> Size of extra inode fields: 4
> BLOCKS:
Ah, OK, here's the problem. The directory points to a file which is
obviously deleted (note the "Links: 0"). All the content of the inode seems
to indicate that the file was correctly deleted (you might check that the
corresponding bit in the bitmap is cleared via: "icheck 88541562").
The question is how it could happen the directory still points to the
inode. Really strange. It looks as if we've lost a write to the directory
but I don't see how. Are there any suspitious kernel messages in this case?

> debugfs: dump <88539836> /tmp/corrupted_dir
>
> (file attached)
>
>
> > You might want to try disabling the DIR_INDEX feature and see whether
> > the corruption still occurs...
>
> We'll try.
It probably won't help. This particular directory had just one block so
DIR_INDEX had no effect on it.

> > > Keeping inodes into servers' cache seems to prevent the problem to happen.
> > > ( yeah, # while true ; do ionice -c3 find /data -size +0 > /dev/null ; done )
> >
> > I'd guess just because they don't have to be read from disk where they
> > get corrupted.
>
> Exactly.
>
>
> > Interesting, but it may well be just by the way how these files get
> > created / updated.
>
> Yes, this is only because of that.
>
> Additional data that may help, we replaced the storage server to
> something slower (less number of CPU, less number of cores, ...). We are
> still getting some corruption but with non-common sense with the former
> server.
>
> The data are stored on two storage arrays of disks. The primary one is
> made of fiber-channel disks used through a simple fiber-channel card,
> RAID soft with md, raid6. The secondary one is made of SCSI disks used
> through a RAID-hard card. We got corruption on both, depending on
> the one currently used into production.
OK, so it's probably not a storage device problem. Good to know.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-07-28 11:27:19

by Sylvain Rochet

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > >
> > > Can you still see the corruption with 2.6.30 kernel?
> >
> > Not upgraded yet, we'll give a try.

Done, now featuring 2.6.30.3 ;)

> > > If you can still see this problem, could you run: debugfs /dev/md10
> > > and send output of the command:
> > > stat <40420228>
> > > (or whatever the corrupted inode number will be)
> > > and also:
> > > dump <40420228> /tmp/corrupted_dir
> >
> > One inode get corrupted recently, here is the output:
> >
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > total 64
> > 88539836 drwxr-sr-x 2 18804 23084 4096 2009-07-25 07:53 .
> > 88539821 drwxr-sr-x 20 18804 23084 4096 2008-08-20 10:14 ..
> > 88541578 -rw-rw-rw- 1 18804 23084 471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > 88541465 -rw-rw-rw- 1 18804 23084 6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > 88541471 -rw-rw-rw- 1 18804 23084 1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > 88541549 -rw-rw-rw- 1 18804 23084 2813 2009-07-25 03:04 INDEX-.edfac52c
> > 88541366 -rw-rw-rw- 1 18804 23084 0 2008-08-17 20:44 .ok
> > ? ?--------- ? ? ? ? ? spip%3Farticle19.f8740dca
> > 88541671 -rw-rw-rw- 1 18804 23084 5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > 88541460 -rw-rw-rw- 1 18804 23084 5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > 88540284 -rw-rw-rw- 1 18804 23084 3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > 88541539 -rw-rw-rw- 1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
> OK, so we couldn't stat a directory...
>
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
> This is probably the misleading output from ext3_iget(). It should give
> you EIO in the latest kernel.

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
cat: spip%3Farticle19.f8740dca: Input/output error

It has much more sense now. We thought the problem was around NFS due
the the previous error message, actually this is probably not the best
looking path.

> > root@bazooka:~# debugfs /dev/md10
> > debugfs 1.40-WIP (14-Nov-2006)
> >
> > debugfs: stat <88539836>
> >
> > Inode: 88539836 Type: directory Mode: 0755 Flags: 0x0 Generation: 791796957
> > User: 18804 Group: 23084 Size: 4096
> > File ACL: 0 Directory ACL: 0
> > Links: 2 Blockcount: 8
> > Fragment: Address: 0 Number: 0 Size: 0
> > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > Size of extra inode fields: 4
> > BLOCKS:
> > (0):177096928
> > TOTAL: 1
> >
> >
> > debugfs: ls <88539836>
> >
> > 88539836 (12) . 88539821 (32) .. 88541366 (12) .ok
> > 88541465 (56) -inc_rss_item-32-wa.23d91cc2
> > 88541539 (40) spip%3Fpage%3Djquery.cce608b6.gz
> > 88540284 (40) spip%3Fpage%3Dforum-30.63b2c1b1
> > 88541460 (28) spip%3Fmot5.f3e9adda
> > 88541471 (160) -inc_rubriques-17-wa.f2f152f0
> > 88541549 (24) INDEX-.edfac52c 88541578 (284) -inc_forum-10-wa.3cb1921f
> > 88541562 (36) spip%3Farticle19.f8740dca
> > 88541671 (3372) spip%3Fauteur1.c64f7f7e
> The directory itself looks fine...
>
> > debugfs: stat <88541562>
> >
> > Inode: 88541562 Type: regular Mode: 0666 Flags: 0x0 Generation: 860068541
> > User: 18804 Group: 23084 Size: 0
> > File ACL: 0 Directory ACL: 0
> > Links: 0 Blockcount: 0
> > Fragment: Address: 0 Number: 0 Size: 0
> > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > Size of extra inode fields: 4
> > BLOCKS:
>
> Ah, OK, here's the problem. The directory points to a file which is
> obviously deleted (note the "Links: 0"). All the content of the inode seems
> to indicate that the file was correctly deleted (you might check that the
> corresponding bit in the bitmap is cleared via: "icheck 88541562").

root@bazooka:~# debugfs /dev/md10
debugfs 1.40-WIP (14-Nov-2006)
debugfs: icheck 88541562
Block Inode number
88541562 <block not found>

> The question is how it could happen the directory still points to the
> inode. Really strange. It looks as if we've lost a write to the directory
> but I don't see how. Are there any suspitious kernel messages in this case?

There were nothing for a while, but since the reboot there are some
about this inode:

EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562

> > We'll try.
>
> It probably won't help. This particular directory had just one block so
> DIR_INDEX had no effect on it.

Let's keep dir_index for now, then.

> OK, so it's probably not a storage device problem. Good to know.

We also thought about motherboard, CPU, or chassis issues, but
everything has been replaced.

The check of the MD raid6 array always ends happily:

Jul 5 01:06:01 bazooka kernel: md: data-check of RAID array md10
Jul 5 01:06:01 bazooka kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Jul 5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jul 5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
Jul 5 04:28:28 bazooka kernel: md: md10: data-check done.

We never saw modification to the data of files themselves, maybe it
happened, but we never saw any evidence of that. Of course, due to the
modification of the filesystem structure, we saw files replaced by other
files ;)

Sylvain

Attachments:

(No filename) (5.74 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2009-07-28 13:52:28

by Jan Kara

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > >
> > > > Can you still see the corruption with 2.6.30 kernel?
> > >
> > > Not upgraded yet, we'll give a try.
>
> Done, now featuring 2.6.30.3 ;)
OK, drop me an email if you will see corruption also with this kernel.

> > > > If you can still see this problem, could you run: debugfs /dev/md10
> > > > and send output of the command:
> > > > stat <40420228>
> > > > (or whatever the corrupted inode number will be)
> > > > and also:
> > > > dump <40420228> /tmp/corrupted_dir
> > >
> > > One inode get corrupted recently, here is the output:
> > >
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# ls -lai
> > > total 64
> > > 88539836 drwxr-sr-x 2 18804 23084 4096 2009-07-25 07:53 .
> > > 88539821 drwxr-sr-x 20 18804 23084 4096 2008-08-20 10:14 ..
> > > 88541578 -rw-rw-rw- 1 18804 23084 471 2009-07-25 04:55 -inc_forum-10-wa.3cb1921f
> > > 88541465 -rw-rw-rw- 1 18804 23084 6693 2009-07-25 07:53 -inc_rss_item-32-wa.23d91cc2
> > > 88541471 -rw-rw-rw- 1 18804 23084 1625 2009-07-25 07:53 -inc_rubriques-17-wa.f2f152f0
> > > 88541549 -rw-rw-rw- 1 18804 23084 2813 2009-07-25 03:04 INDEX-.edfac52c
> > > 88541366 -rw-rw-rw- 1 18804 23084 0 2008-08-17 20:44 .ok
> > > ? ?--------- ? ? ? ? ? spip%3Farticle19.f8740dca
> > > 88541671 -rw-rw-rw- 1 18804 23084 5619 2009-07-24 21:07 spip%3Fauteur1.c64f7f7e
> > > 88541460 -rw-rw-rw- 1 18804 23084 5636 2009-07-24 19:30 spip%3Fmot5.f3e9adda
> > > 88540284 -rw-rw-rw- 1 18804 23084 3802 2009-07-25 16:10 spip%3Fpage%3Dforum-30.63b2c1b1
> > > 88541539 -rw-rw-rw- 1 18804 23084 12972 2009-07-25 11:14 spip%3Fpage%3Djquery.cce608b6.gz
> > OK, so we couldn't stat a directory...
> >
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Stale NFS file handle
> > This is probably the misleading output from ext3_iget(). It should give
> > you EIO in the latest kernel.
>
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> cat: spip%3Farticle19.f8740dca: Input/output error
>
> It has much more sense now. We thought the problem was around NFS due
> the the previous error message, actually this is probably not the best
> looking path.
Yes, EIO makes more sence. I think the problem is NFS connected anyway
though :). But I don't have a clue how it can happen yet. Maybe I can try
adding some low-cost debugging checks if you'd be willing to run such
kernel...
I'm adding to CC linux-nfs just in case someone has an idea.

> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > >
> > > debugfs: stat <88539836>
> > >
> > > Inode: 88539836 Type: directory Mode: 0755 Flags: 0x0 Generation: 791796957
> > > User: 18804 Group: 23084 Size: 4096
> > > File ACL: 0 Directory ACL: 0
> > > Links: 2 Blockcount: 8
> > > Fragment: Address: 0 Number: 0 Size: 0
> > > ctime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > atime: 0x4a0de585 -- Fri May 15 23:58:29 2009
> > > mtime: 0x4a6a9dd5 -- Sat Jul 25 07:53:25 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> > > (0):177096928
> > > TOTAL: 1
> > >
> > >
> > > debugfs: ls <88539836>
> > >
> > > 88539836 (12) . 88539821 (32) .. 88541366 (12) .ok
> > > 88541465 (56) -inc_rss_item-32-wa.23d91cc2
> > > 88541539 (40) spip%3Fpage%3Djquery.cce608b6.gz
> > > 88540284 (40) spip%3Fpage%3Dforum-30.63b2c1b1
> > > 88541460 (28) spip%3Fmot5.f3e9adda
> > > 88541471 (160) -inc_rubriques-17-wa.f2f152f0
> > > 88541549 (24) INDEX-.edfac52c 88541578 (284) -inc_forum-10-wa.3cb1921f
> > > 88541562 (36) spip%3Farticle19.f8740dca
> > > 88541671 (3372) spip%3Fauteur1.c64f7f7e
> > The directory itself looks fine...
> >
> > > debugfs: stat <88541562>
> > >
> > > Inode: 88541562 Type: regular Mode: 0666 Flags: 0x0 Generation: 860068541
> > > User: 18804 Group: 23084 Size: 0
> > > File ACL: 0 Directory ACL: 0
> > > Links: 0 Blockcount: 0
> > > Fragment: Address: 0 Number: 0 Size: 0
> > > ctime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > atime: 0x4a6a612f -- Sat Jul 25 03:34:39 2009
> > > mtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > dtime: 0x4a6a8fac -- Sat Jul 25 06:53:00 2009
> > > Size of extra inode fields: 4
> > > BLOCKS:
> >
> > Ah, OK, here's the problem. The directory points to a file which is
> > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > to indicate that the file was correctly deleted (you might check that the
> > corresponding bit in the bitmap is cleared via: "icheck 88541562").
>
> root@bazooka:~# debugfs /dev/md10
> debugfs 1.40-WIP (14-Nov-2006)
> debugfs: icheck 88541562
> Block Inode number
> 88541562 <block not found>
Ah, wrong debugfs command. I should have written:
testi <88541562>

> > The question is how it could happen the directory still points to the
> > inode. Really strange. It looks as if we've lost a write to the directory
> > but I don't see how. Are there any suspitious kernel messages in this case?
>
> There were nothing for a while, but since the reboot there are some
> about this inode:
>
> EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
Yes, that's to be expected given the corruption any NFS error messages?

> > > We'll try.
> >
> > It probably won't help. This particular directory had just one block so
> > DIR_INDEX had no effect on it.
>
> Let's keep dir_index for now, then.
OK.

> > OK, so it's probably not a storage device problem. Good to know.
>
> We also thought about motherboard, CPU, or chassis issues, but
> everything has been replaced.
>
>
> The check of the MD raid6 array always ends happily:
>
> Jul 5 01:06:01 bazooka kernel: md: data-check of RAID array md10
> Jul 5 01:06:01 bazooka kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> Jul 5 01:06:01 bazooka kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> Jul 5 01:06:01 bazooka kernel: md: using 128k window, over a total of 143373888 blocks.
> Jul 5 04:28:28 bazooka kernel: md: md10: data-check done.
>
>
> We never saw modification to the data of files themselves, maybe it
> happened, but we never saw any evidence of that. Of course, due to the
> modification of the filesystem structure, we saw files replaced by other
> files ;)

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR

2009-07-28 16:40:43

by Daniel J Blueman

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

On Apr 20, 5:30?pm, Sylvain Rochet <[email protected]> wrote:
> Hi,
>
> We(TuxFamily) are having some inodes corruptions on a NFS server.
>
> So, let's start with the facts.
>
> ==== NFS Server
>
> Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
[snip]

Can you do a 'lspci -v' on the server please?

Daniel
--
Daniel J Blueman

2009-07-28 16:41:46

by Sylvain Rochet

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > >
> > > > > Can you still see the corruption with 2.6.30 kernel?
> > > >
> > > > Not upgraded yet, we'll give a try.
> >
> > Done, now featuring 2.6.30.3 ;)
>
> OK, drop me an email if you will see corruption also with this kernel.

Lets move out the corrupted directory ;)

root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok
rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops

> > > This is probably the misleading output from ext3_iget(). It should give
> > > you EIO in the latest kernel.
> >
> > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > cat: spip%3Farticle19.f8740dca: Input/output error
> >
> > It has much more sense now. We thought the problem was around NFS due
> > the the previous error message, actually this is probably not the best
> > looking path.
>
> Yes, EIO makes more sence. I think the problem is NFS connected anyway
> though :). But I don't have a clue how it can happen yet. Maybe I can try
> adding some low-cost debugging checks if you'd be willing to run such
> kernel...

Without any problem, we have 24/7/365 physical access and we don't need
to provide high-availability services.

Anyway, the data hosted aren't that important, there is little or even
no need for strict confidentiality, so we will be happy to provide ssh
access to whom would like to look deeper into this issue.

> I'm adding to CC linux-nfs just in case someone has an idea.
>
> > > Ah, OK, here's the problem. The directory points to a file which is
> > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > to indicate that the file was correctly deleted (you might check that the
> > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> >
> > root@bazooka:~# debugfs /dev/md10
> > debugfs 1.40-WIP (14-Nov-2006)
> > debugfs: icheck 88541562
> > Block Inode number
> > 88541562 <block not found>
>
> Ah, wrong debugfs command. I should have written:
> testi <88541562>

debugfs: testi <88541562>
Inode 88541562 is not in use

> > > The question is how it could happen the directory still points to the
> > > inode. Really strange. It looks as if we've lost a write to the directory
> > > but I don't see how. Are there any suspitious kernel messages in this case?
> >
> > There were nothing for a while, but since the reboot there are some
> > about this inode:
> >
> > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
>
> Yes, that's to be expected given the corruption any NFS error messages?

There are some error messages on NFS clients, however they are quite old.

Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
May 3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
May 3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
May 7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
May 7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!

This is obviously related to the corruption.

Sylvain

Attachments:

(No filename) (3.63 kB)
signature.asc (189.00 B)
Digital signature Download all attachments

2009-07-28 16:45:23

by Sylvain Rochet

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

On Tue, Jul 28, 2009 at 09:40:42AM -0700, Daniel J Blueman wrote:
> On Apr 20, 5:30?pm, Sylvain Rochet <[email protected]> wrote:
> > Hi,
> >
> > We(TuxFamily) are having some inodes corruptions on a NFS server.
> >
> > So, let's start with the facts.
> >
> > ==== NFS Server
> >
> > Linux bazooka 2.6.28.9 #1 SMP Mon Mar 30 12:58:22 CEST 2009 x86_64 GNU/Linux
> [snip]
>
> Can you do a 'lspci -v' on the server please?

Of course yes.

I attached the 'lspci -v' of the previous and the current storage
server.

Sylvain

Attachments:

(No filename) (0.00 B)
signature.asc (189.00 B)
Digital signature Download all attachments

2009-07-28 21:12:20

by J. Bruce Fields

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

On Tue, Jul 28, 2009 at 06:41:42PM +0200, Sylvain Rochet wrote:
> Hi,
>
>
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > >
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > >
> > > > > Not upgraded yet, we'll give a try.
> > >
> > > Done, now featuring 2.6.30.3 ;)
> >
> > OK, drop me an email if you will see corruption also with this kernel.
>
> Lets move out the corrupted directory ;)
>
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
>
>
> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > >
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > >
> > > It has much more sense now. We thought the problem was around NFS due
> > > the the previous error message, actually this is probably not the best
> > > looking path.
> >
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
>
> Without any problem, we have 24/7/365 physical access and we don't need
> to provide high-availability services.
>
> Anyway, the data hosted aren't that important, there is little or even
> no need for strict confidentiality, so we will be happy to provide ssh
> access to whom would like to look deeper into this issue.
>
>
> > I'm adding to CC linux-nfs just in case someone has an idea.
> >
> > > > Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > >
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs: icheck 88541562
> > > Block Inode number
> > > 88541562 <block not found>
> >
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
>
> debugfs: testi <88541562>
> Inode 88541562 is not in use
>
>
> > > > The question is how it could happen the directory still points to the
> > > > inode. Really strange. It looks as if we've lost a write to the directory
> > > > but I don't see how. Are there any suspitious kernel messages in this case?
> > >
> > > There were nothing for a while, but since the reboot there are some
> > > about this inode:
> > >
> > > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
> >
> > Yes, that's to be expected given the corruption any NFS error messages?
>
> There are some error messages on NFS clients, however they are quite old.
>
> Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
> May 3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
> May 3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
> May 7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
> May 7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
> May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!
>
> This is obviously related to the corruption.

It might be interesting to know whether the file that we returned to the
client with nlink 0 was the same that you later saw corruption on; maybe
adding a printk of the inode number there would help.

Googling around on that error message, a previous thread:

http://marc.info/?t=107429333300004&r=1&w=4

seems to conclude it's a bug, but doesn't followup with a fix. And I
don't see any mention of possible filesystem corruption.

Is NFSv4 involved here? I wonder if something that might otherwise be
only a problem for the client could become a problem for the server if
it attempts to do further operations with an unlinked inode in a
compound operation that follows a lookup.

--b.

2009-07-29 12:58:19

by Jan Kara

[permalink] [raw]

Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> Hi,
>
>
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > >
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > >
> > > > > Not upgraded yet, we'll give a try.
> > >
> > > Done, now featuring 2.6.30.3 ;)
> >
> > OK, drop me an email if you will see corruption also with this kernel.
>
> Lets move out the corrupted directory ;)
>
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
Actually, leaving that file in the filesystem can potentially lead to
strange effects because eventually the inode "spip%3Farticle19.f8740dca"
points to gets reallocated and then you can get e.g. a hardlinked
directory. On the other hand having it lost+found should be safe enough.

> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > >
> > > root@bazooka:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > >
> > > It has much more sense now. We thought the problem was around NFS due
> > > the the previous error message, actually this is probably not the best
> > > looking path.
> >
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
>
> Without any problem, we have 24/7/365 physical access and we don't need
> to provide high-availability services.
Cool, I'll try to cook up something then.

> Anyway, the data hosted aren't that important, there is little or even
> no need for strict confidentiality, so we will be happy to provide ssh
> access to whom would like to look deeper into this issue.
I don't need to go that far (at least for now) but thanks for the offer.

> > I'm adding to CC linux-nfs just in case someone has an idea.
> >
> > > > Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > >
> > > root@bazooka:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs: icheck 88541562
> > > Block Inode number
> > > 88541562 <block not found>
> >
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
>
> debugfs: testi <88541562>
> Inode 88541562 is not in use
Yes, again this confirms that the inode was just correctly deleted. But
somehow a pointer to it remained in the directory.

> > > > The question is how it could happen the directory still points to the
> > > > inode. Really strange. It looks as if we've lost a write to the directory
> > > > but I don't see how. Are there any suspitious kernel messages in this case?
> > >
> > > There were nothing for a while, but since the reboot there are some
> > > about this inode:
> > >
> > > EXT3-fs error (device md10): ext3_lookup: deleted inode referenced: 88541562
> >
> > Yes, that's to be expected given the corruption any NFS error messages?
>
> There are some error messages on NFS clients, however they are quite old.
>
> Apr 19 15:38:21 gin kernel: NFS: Buggy server - nlink == 0!
> May 3 20:00:52 gin kernel: NFS: Buggy server - nlink == 0!
> May 3 23:24:03 gin kernel: NFS: Buggy server - nlink == 0!
> May 7 11:40:57 gin kernel: NFS: Buggy server - nlink == 0!
> May 7 14:41:02 gin kernel: NFS: Buggy server - nlink == 0!
> May 26 11:10:42 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 11:13:28 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:34:39 cognac kernel: NFS: Buggy server - nlink == 0!
> May 26 12:39:43 cognac kernel: NFS: Buggy server - nlink == 0!
>
> This is obviously related to the corruption.
Yes, this is a consequence of the bug - somebody deleted an inode because
i_nlink dropped down to 0 but the inode was in fact still referenced.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR