2013-02-27 15:43:11

by Dave Jones

[permalink] [raw]
Subject: EXT4 corruption on Linus latest tree.

Built from a pull around midnight EST last night.
(Don't have the git hash, as the source is on the disk that is now inaccessable..)

EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235381: block 152052288: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172228609: block 152051744: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0

This is a 3TB disk which has 1.9TB used.

I can see some files/dirs, but some top level dirs now appear empty.

About to reboot back to a safe kernel and fsck.

Dave


2013-02-27 15:55:39

by Borislav Petkov

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

On Wed, Feb 27, 2013 at 10:43:11AM -0500, Dave Jones wrote:
> Built from a pull around midnight EST last night.
> (Don't have the git hash, as the source is on the disk that is now inaccessable..)
>
> EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235381: block 152052288: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172228609: block 152051744: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
>
> This is a 3TB disk which has 1.9TB used.
>
> I can see some files/dirs, but some top level dirs now appear empty.
>
> About to reboot back to a safe kernel and fsck.

Hmm, more people triggering something like that:
http://marc.info/?l=linux-kernel&m=136196926015305&w=2

--
Regards/Gruss,
Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

2013-02-27 16:04:46

by Dave Jones

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

On Wed, Feb 27, 2013 at 04:55:39PM +0100, Borislav Petkov wrote:
> On Wed, Feb 27, 2013 at 10:43:11AM -0500, Dave Jones wrote:
> > Built from a pull around midnight EST last night.
> > (Don't have the git hash, as the source is on the disk that is now inaccessable..)
> >
> > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235381: block 152052288: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172228609: block 152051744: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> >
> > This is a 3TB disk which has 1.9TB used.
> >
> > I can see some files/dirs, but some top level dirs now appear empty.
> >
> > About to reboot back to a safe kernel and fsck.
>
> Hmm, more people triggering something like that:
> http://marc.info/?l=linux-kernel&m=136196926015305&w=2

Yeah, looks similar. The missing files/dirs reappeared when I
booted an older kernel, so it looks like the corruption doesn't
hit the disk. Fsck (1.42.5) didn't find anything either.

Dave

2013-02-27 16:32:03

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

On Wed, Feb 27, 2013 at 11:04:46AM -0500, Dave Jones wrote:
> > Hmm, more people triggering something like that:
> > http://marc.info/?l=linux-kernel&m=136196926015305&w=2
>
> Yeah, looks similar. The missing files/dirs reappeared when I
> booted an older kernel, so it looks like the corruption doesn't
> hit the disk. Fsck (1.42.5) didn't find anything either.

Thanks for the report. I'm working to replicate and fix the problem...

- Ted

2013-02-27 16:44:17

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

On Wed, Feb 27, 2013 at 11:04:46AM -0500, Dave Jones wrote:
> > > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
>
> Yeah, looks similar. The missing files/dirs reappeared when I
> booted an older kernel, so it looks like the corruption doesn't
> hit the disk. Fsck (1.42.5) didn't find anything either.

I suspect I see the problem... can you send me the results of

debugfs -R "stat <172235804>" /dev/sdb1

to confirm?

Thanks,

- Ted

2013-02-27 16:56:41

by Dave Jones

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

On Wed, Feb 27, 2013 at 11:44:17AM -0500, Theodore Ts'o wrote:
> On Wed, Feb 27, 2013 at 11:04:46AM -0500, Dave Jones wrote:
> > > > EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
> >
> > Yeah, looks similar. The missing files/dirs reappeared when I
> > booted an older kernel, so it looks like the corruption doesn't
> > hit the disk. Fsck (1.42.5) didn't find anything either.
>
> I suspect I see the problem... can you send me the results of
>
> debugfs -R "stat <172235804>" /dev/sdb1
>
> to confirm?

debugfs 1.42.5 (29-Jul-2012)
Inode: 172235804 Type: directory Mode: 0775 Flags: 0x80000
Generation: 1174354732 Version: 0x00000000:00000055
User: 1000 Group: 1000 Size: 4096
File ACL: 0 Directory ACL: 0
Links: 24 Blockcount: 8
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x512d0fdc:71fe3000 -- Tue Feb 26 14:41:16 2013
atime: 0x512d106e:5a143300 -- Tue Feb 26 14:43:42 2013
mtime: 0x512d0fdc:71fe3000 -- Tue Feb 26 14:41:16 2013
crtime: 0x50e76c35:1d69aad8 -- Fri Jan 4 18:56:37 2013
Size of extra inode fields: 28
Extended attributes stored in inode body:
selinux = "unconfined_u:object_r:file_t:s0\000" (32)
EXTENTS:
(0):688923213


That took about 2 minutes to run btw, expected ?

Dave




2013-02-27 17:07:10

by Zheng Liu

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

?? 2013-2-28??????12:56??Dave Jones <[email protected]> ะด????

> On Wed, Feb 27, 2013 at 11:44:17AM -0500, Theodore Ts'o wrote:
>> On Wed, Feb 27, 2013 at 11:04:46AM -0500, Dave Jones wrote:
>>>>> EXT4-fs error (device sdb1): htree_dirblock_to_tree:919: inode #172235804: block 152052301: comm ls: bad entry in directory: rec_len is smaller than minimal - offset=0(0), inode=0, rec_len=0, name_len=0
>>>
>>> Yeah, looks similar. The missing files/dirs reappeared when I
>>> booted an older kernel, so it looks like the corruption doesn't
>>> hit the disk. Fsck (1.42.5) didn't find anything either.
>>
>> I suspect I see the problem... can you send me the results of
>>
>> debugfs -R "stat <172235804>" /dev/sdb1
>>
>> to confirm?
>
> debugfs 1.42.5 (29-Jul-2012)
> Inode: 172235804 Type: directory Mode: 0775 Flags: 0x80000
> Generation: 1174354732 Version: 0x00000000:00000055
> User: 1000 Group: 1000 Size: 4096
> File ACL: 0 Directory ACL: 0
> Links: 24 Blockcount: 8
> Fragment: Address: 0 Number: 0 Size: 0
> ctime: 0x512d0fdc:71fe3000 -- Tue Feb 26 14:41:16 2013
> atime: 0x512d106e:5a143300 -- Tue Feb 26 14:43:42 2013
> mtime: 0x512d0fdc:71fe3000 -- Tue Feb 26 14:41:16 2013
> crtime: 0x50e76c35:1d69aad8 -- Fri Jan 4 18:56:37 2013
> Size of extra inode fields: 28
> Extended attributes stored in inode body:
> selinux = "unconfined_u:object_r:file_t:s0\000" (32)
> EXTENTS:
> (0):688923213
>
>
> That took about 2 minutes to run btw, expected ?

Hi Dave and Ted,

Thanks for the report. From the result, I think extent status tree is root cause because of wrong logical-to-physical block mapping. I am very sorry about that. I will try to fix the bug ASAP.

Ted, I am not sure whether we need to revert the patch or give me sometimes to fix it.

Thanks!!
- Zheng-

2013-02-27 18:03:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: EXT4 corruption on Linus latest tree.

On Thu, Feb 28, 2013 at 01:07:10AM +0800, gnehzuil.liu wrote:
>
> Thanks for the report. From the result, I think extent status tree
> is root cause because of wrong logical-to-physical block mapping. I
> am very sorry about that. I will try to fix the bug ASAP.

Here's a hint as to what's going on:

% bc
bc 1.06.95
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
obase=16
# This is the block number printed in the error message
152052301
910224D

# This is the block number reported by debugfs
688923213
2910224D

It looks like something in the code is masking off the low 25 bits, so
we're losing the higher bits from the physical block number. That
should be pretty easy to find and fix....

- Ted