2009-08-03 22:29:01

by Jan Kara

[permalink] [raw]
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,

On Tue 28-07-09 18:41:42, Sylvain Rochet wrote:
> On Tue, Jul 28, 2009 at 03:52:26PM +0200, Jan Kara wrote:
> > On Tue 28-07-09 13:27:15, Sylvain Rochet wrote:
> > > On Mon, Jul 27, 2009 at 05:42:53PM +0200, Jan Kara wrote:
> > > > On Sat 25-07-09 17:17:52, Sylvain Rochet wrote:
> > > > > >
> > > > > > Can you still see the corruption with 2.6.30 kernel?
> > > > >
> > > > > Not upgraded yet, we'll give a try.
> > >
> > > Done, now featuring 2.6.30.3 ;)
> >
> > OK, drop me an email if you will see corruption also with this kernel.
>
> Lets move out the corrupted directory ;)
>
> [email protected]:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# rm -- * .ok
> rm: cannot remove `spip%3Farticle19.f8740dca': Input/output error
> [email protected]:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cd ..
> [email protected]:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache# mv e/ /data/lost+found/wooops
>
> > > > This is probably the misleading output from ext3_iget(). It should give
> > > > you EIO in the latest kernel.
> > >
> > > [email protected]:/data/web/ed/90/48/walotux.walon.org/htdocs/tmp/cache/e# cat spip%3Farticle19.f8740dca
> > > cat: spip%3Farticle19.f8740dca: Input/output error
> > >
> > > It has much more sense now. We thought the problem was around NFS due
> > > the the previous error message, actually this is probably not the best
> > > looking path.
> >
> > Yes, EIO makes more sence. I think the problem is NFS connected anyway
> > though :). But I don't have a clue how it can happen yet. Maybe I can try
> > adding some low-cost debugging checks if you'd be willing to run such
> > kernel...
>
> Without any problem, we have 24/7/365 physical access and we don't need
> to provide high-availability services.
>
> Anyway, the data hosted aren't that important, there is little or even
> no need for strict confidentiality, so we will be happy to provide ssh
> access to whom would like to look deeper into this issue.
>
>
> > I'm adding to CC linux-nfs just in case someone has an idea.
> >
> > > > Ah, OK, here's the problem. The directory points to a file which is
> > > > obviously deleted (note the "Links: 0"). All the content of the inode seems
> > > > to indicate that the file was correctly deleted (you might check that the
> > > > corresponding bit in the bitmap is cleared via: "icheck 88541562").
> > >
> > > [email protected]:~# debugfs /dev/md10
> > > debugfs 1.40-WIP (14-Nov-2006)
> > > debugfs: icheck 88541562
> > > Block Inode number
> > > 88541562 <block not found>
> >
> > Ah, wrong debugfs command. I should have written:
> > testi <88541562>
>
> debugfs: testi <88541562>
> Inode 88541562 is not in use
OK, I've found some time and written the debugging patch. Hopefully it
will tell us more. It should output messages to the kernel log if it
finds something suspicious - like:
No dentry for unlinked inode...
Dentry ... for unlinked inode ... has no parent
Found directory entry ... for unlinked inode

When you see such messages in the log, send them to me please. Also
attach the System.map file so that I can translate the address where
i_nlink was dropped - for that ext3 should be compiled into the kernel
(should not be a module). Thanks a lot for testing.

Honza

--
Jan Kara <[email protected]>
SUSE Labs, CR


Attachments:
(No filename) (3.26 kB)
0001-ext3-Debug-unlinking-of-inodes.patch (3.48 kB)
Download all attachments

2009-08-04 11:15:06

by Sylvain Rochet

[permalink] [raw]
Subject: Re: 2.6.28.9: EXT3/NFS inodes corruption

Hi,


On Tue, Aug 04, 2009 at 12:29:01AM +0200, Jan Kara wrote:
>
> OK, I've found some time and written the debugging patch. Hopefully it
> will tell us more. It should output messages to the kernel log if it
> finds something suspicious - like:
> No dentry for unlinked inode...
> Dentry ... for unlinked inode ... has no parent
> Found directory entry ... for unlinked inode
>
> When you see such messages in the log, send them to me please. Also
> attach the System.map file so that I can translate the address where
> i_nlink was dropped - for that ext3 should be compiled into the kernel
> (should not be a module). Thanks a lot for testing.

Patch applied.

And there is already a lot of output.

http://edony.tuxfamily.org/~grad/bazooka/System.map-2.6.30.4
http://edony.tuxfamily.org/~grad/bazooka/config-2.6.30.4
http://edony.tuxfamily.org/~grad/bazooka/kern.log


Sylvain


Attachments:
(No filename) (888.00 B)
signature.asc (189.00 B)
Digital signature
Download all attachments