LinuxLists.cc - Filesystem state: clean with errors

2013-06-03 18:50:46

Subject: Filesystem state: clean with errors - what errors?

Executing dumpe2fs -h on one of the partitions says

...
Filesystem features: has_journal ext_attr resize_inode dir_index
filetype extent flex_bg sparse_super large_file huge_file uninit_bg
dir_nlink extra_isize
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean with errors
...

How can I find out what the errors are - the details of the errors.

Thanks

Autif

Some more logs (I am not sure I see an error here):

Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
Checksum 0xc4c7, unused inodes 0
Primary superblock at 0, Group descriptors at 1-1
Reserved GDT blocks at 2-256
Block bitmap at 257 (+257), Inode bitmap at 273 (+273)
Inode table at 289-800 (+289)
21467 free blocks, 0 free inodes, 1834 directories
Free blocks: 11301-32767
Free inodes:
Group 1: (Blocks 32768-65535) [ITABLE_ZEROED]
Checksum 0xc9c2, unused inodes 623
Backup superblock at 32768, Group descriptors at 32769-32769
Reserved GDT blocks at 32770-33024
Block bitmap at 258 (bg #0 + 258), Inode bitmap at 274 (bg #0 + 274)
Inode table at 801-1312 (bg #0 + 801)
0 free blocks, 623 free inodes, 974 directories, 623 unused inodes
Free blocks:
Free inodes: 15762-16384
Group 2: (Blocks 65536-98303) [INODE_UNINIT, ITABLE_ZEROED]
...
Group 16: (Blocks 524288-557055) [ITABLE_ZEROED]
Checksum 0xcee5, unused inodes 0
Block bitmap at 524288 (+0), Inode bitmap at 524304 (+16)
Inode table at 524320-524831 (+32)
22714 free blocks, 0 free inodes, 1672 directories
Free blocks: 534342-557055
Free inodes:
Group 17: (Blocks 557056-589823) [ITABLE_ZEROED]
Checksum 0x9983, unused inodes 7375
Block bitmap at 524289 (bg #16 + 1), Inode bitmap at 524305 (bg #16 + 17)
Inode table at 524832-525343 (bg #16 + 544)
16730 free blocks, 7375 free inodes, 140 directories, 7375 unused inodes
Free blocks: 557571-558079, 565411-573439, 581632-589823
Free inodes: 140082-147456
Group 18: (Blocks 589824-622591) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
...

Everything else is INODE_UNINIT

2013-06-03 19:13:54

by Eric Sandeen

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On 6/3/13 1:45 PM, Autif Khan wrote:
> Executing dumpe2fs -h on one of the partitions says
>
> ...
> Filesystem features: has_journal ext_attr resize_inode dir_index
> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
> dir_nlink extra_isize
> Filesystem flags: signed_directory_hash
> Default mount options: user_xattr acl
> Filesystem state: clean with errors
> ...
>
> How can I find out what the errors are - the details of the errors.

"clean" means the log has been replayed (log is not dirty)
"with errors" means that it encountered concistency errors at runtime

run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
dry run)

-Eric

> Thanks
>
> Autif
>
> Some more logs (I am not sure I see an error here):
>
> Group 0: (Blocks 0-32767) [ITABLE_ZEROED]
> Checksum 0xc4c7, unused inodes 0
> Primary superblock at 0, Group descriptors at 1-1
> Reserved GDT blocks at 2-256
> Block bitmap at 257 (+257), Inode bitmap at 273 (+273)
> Inode table at 289-800 (+289)
> 21467 free blocks, 0 free inodes, 1834 directories
> Free blocks: 11301-32767
> Free inodes:
> Group 1: (Blocks 32768-65535) [ITABLE_ZEROED]
> Checksum 0xc9c2, unused inodes 623
> Backup superblock at 32768, Group descriptors at 32769-32769
> Reserved GDT blocks at 32770-33024
> Block bitmap at 258 (bg #0 + 258), Inode bitmap at 274 (bg #0 + 274)
> Inode table at 801-1312 (bg #0 + 801)
> 0 free blocks, 623 free inodes, 974 directories, 623 unused inodes
> Free blocks:
> Free inodes: 15762-16384
> Group 2: (Blocks 65536-98303) [INODE_UNINIT, ITABLE_ZEROED]
> ...
> Group 16: (Blocks 524288-557055) [ITABLE_ZEROED]
> Checksum 0xcee5, unused inodes 0
> Block bitmap at 524288 (+0), Inode bitmap at 524304 (+16)
> Inode table at 524320-524831 (+32)
> 22714 free blocks, 0 free inodes, 1672 directories
> Free blocks: 534342-557055
> Free inodes:
> Group 17: (Blocks 557056-589823) [ITABLE_ZEROED]
> Checksum 0x9983, unused inodes 7375
> Block bitmap at 524289 (bg #16 + 1), Inode bitmap at 524305 (bg #16 + 17)
> Inode table at 524832-525343 (bg #16 + 544)
> 16730 free blocks, 7375 free inodes, 140 directories, 7375 unused inodes
> Free blocks: 557571-558079, 565411-573439, 581632-589823
> Free inodes: 140082-147456
> Group 18: (Blocks 589824-622591) [INODE_UNINIT, BLOCK_UNINIT, ITABLE_ZEROED]
> ...
>
> Everything else is INODE_UNINIT
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2013-06-03 19:30:58

by Autif Khan

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On Mon, Jun 3, 2013 at 3:13 PM, Eric Sandeen <[email protected]> wrote:
> On 6/3/13 1:45 PM, Autif Khan wrote:
>> Executing dumpe2fs -h on one of the partitions says
>>
>> ...
>> Filesystem features: has_journal ext_attr resize_inode dir_index
>> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
>> dir_nlink extra_isize
>> Filesystem flags: signed_directory_hash
>> Default mount options: user_xattr acl
>> Filesystem state: clean with errors
>> ...
>>
>> How can I find out what the errors are - the details of the errors.
>
> "clean" means the log has been replayed (log is not dirty)
> "with errors" means that it encountered concistency errors at runtime
>
> run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
> dry run)

--- spin ---

ubuntu@mac0013950af6fb:~$ sudo fsck -V -n -f /dev/sda5
fsck from util-linux 2.20.1
[/sbin/fsck.ext4 (1) -- /koko] fsck.ext4 -n -f /dev/sda5
e2fsck 1.42 (29-Nov-2011)
Warning! /dev/sda5 is mounted.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/sda5: 24770/262144 files (0.1% non-contiguous), 328031/1048576 blocks
ubuntu@mac0013950af6fb:~$

I am not sure I see any errors. Is there an error here?

2013-06-03 19:34:03

by Eric Sandeen

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On 6/3/13 2:29 PM, Autif Khan wrote:
> On Mon, Jun 3, 2013 at 3:13 PM, Eric Sandeen <[email protected]> wrote:
>> On 6/3/13 1:45 PM, Autif Khan wrote:
>>> Executing dumpe2fs -h on one of the partitions says
>>>
>>> ...
>>> Filesystem features: has_journal ext_attr resize_inode dir_index
>>> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
>>> dir_nlink extra_isize
>>> Filesystem flags: signed_directory_hash
>>> Default mount options: user_xattr acl
>>> Filesystem state: clean with errors
>>> ...
>>>
>>> How can I find out what the errors are - the details of the errors.
>>
>> "clean" means the log has been replayed (log is not dirty)
>> "with errors" means that it encountered concistency errors at runtime
>>
>> run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
>> dry run)
>
> --- spin ---
>
> ubuntu@mac0013950af6fb:~$ sudo fsck -V -n -f /dev/sda5
> fsck from util-linux 2.20.1
> [/sbin/fsck.ext4 (1) -- /koko] fsck.ext4 -n -f /dev/sda5
> e2fsck 1.42 (29-Nov-2011)
> Warning! /dev/sda5 is mounted.

Surprising that it didn't find errors since you ran it on a mounted fs!

That's also an older e2fsck, so I suppose it's possible that it missed
something.

> Pass 1: Checking inodes, blocks, and sizes
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information
> /dev/sda5: 24770/262144 files (0.1% non-contiguous), 328031/1048576 blocks
> ubuntu@mac0013950af6fb:~$
>
> I am not sure I see any errors. Is there an error here?

No, that didn't report any errors.

If you unmount it and do it w/o -n, it should clear the error state.
Perhaps it encountered an error for a file that got subsequently deleted,
or something - not sure.

-eric

2013-06-03 20:07:35

by Autif Khan

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On Mon, Jun 3, 2013 at 3:34 PM, Eric Sandeen <[email protected]> wrote:
> On 6/3/13 2:29 PM, Autif Khan wrote:
>> On Mon, Jun 3, 2013 at 3:13 PM, Eric Sandeen <[email protected]> wrote:
>>> On 6/3/13 1:45 PM, Autif Khan wrote:
>>>> Executing dumpe2fs -h on one of the partitions says
>>>>
>>>> ...
>>>> Filesystem features: has_journal ext_attr resize_inode dir_index
>>>> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
>>>> dir_nlink extra_isize
>>>> Filesystem flags: signed_directory_hash
>>>> Default mount options: user_xattr acl
>>>> Filesystem state: clean with errors
>>>> ...
>>>>
>>>> How can I find out what the errors are - the details of the errors.
>>>
>>> "clean" means the log has been replayed (log is not dirty)
>>> "with errors" means that it encountered concistency errors at runtime
>>>
>>> run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
>>> dry run)
>>
>> --- spin ---
>>
>> ubuntu@mac0013950af6fb:~$ sudo fsck -V -n -f /dev/sda5
>> fsck from util-linux 2.20.1
>> [/sbin/fsck.ext4 (1) -- /koko] fsck.ext4 -n -f /dev/sda5
>> e2fsck 1.42 (29-Nov-2011)
>> Warning! /dev/sda5 is mounted.
>
> Surprising that it didn't find errors since you ran it on a mounted fs!
>
> That's also an older e2fsck, so I suppose it's possible that it missed
> something.
>
>> Pass 1: Checking inodes, blocks, and sizes
>> Pass 2: Checking directory structure
>> Pass 3: Checking directory connectivity
>> Pass 4: Checking reference counts
>> Pass 5: Checking group summary information
>> /dev/sda5: 24770/262144 files (0.1% non-contiguous), 328031/1048576 blocks
>> ubuntu@mac0013950af6fb:~$
>>
>> I am not sure I see any errors. Is there an error here?
>
> No, that didn't report any errors.
>
> If you unmount it and do it w/o -n, it should clear the error state.
> Perhaps it encountered an error for a file that got subsequently deleted,
> or something - not sure.
>

That is true - we are able to fix this - almost trivially - the
problem is that we are causing this frequently (sometimes always) with
inexpensive SSDs. I am sure you have seen my other email:
http://marc.info/?l=linux-ext4&m=137028288823079&w=2

I assume that there is no other tool that I can use - (short of a hex
dump of the 4.0G partition using dd) - to further debug this. Is
there?

2013-06-03 20:14:29

by Eric Sandeen

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On 6/3/13 3:07 PM, Autif Khan wrote:
> On Mon, Jun 3, 2013 at 3:34 PM, Eric Sandeen <[email protected]> wrote:
>> On 6/3/13 2:29 PM, Autif Khan wrote:
>>> On Mon, Jun 3, 2013 at 3:13 PM, Eric Sandeen <[email protected]> wrote:
>>>> On 6/3/13 1:45 PM, Autif Khan wrote:
>>>>> Executing dumpe2fs -h on one of the partitions says
>>>>>
>>>>> ...
>>>>> Filesystem features: has_journal ext_attr resize_inode dir_index
>>>>> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
>>>>> dir_nlink extra_isize
>>>>> Filesystem flags: signed_directory_hash
>>>>> Default mount options: user_xattr acl
>>>>> Filesystem state: clean with errors
>>>>> ...
>>>>>
>>>>> How can I find out what the errors are - the details of the errors.
>>>>
>>>> "clean" means the log has been replayed (log is not dirty)
>>>> "with errors" means that it encountered concistency errors at runtime
>>>>
>>>> run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
>>>> dry run)
>>>
>>> --- spin ---
>>>
>>> ubuntu@mac0013950af6fb:~$ sudo fsck -V -n -f /dev/sda5
>>> fsck from util-linux 2.20.1
>>> [/sbin/fsck.ext4 (1) -- /koko] fsck.ext4 -n -f /dev/sda5
>>> e2fsck 1.42 (29-Nov-2011)
>>> Warning! /dev/sda5 is mounted.
>>
>> Surprising that it didn't find errors since you ran it on a mounted fs!
>>
>> That's also an older e2fsck, so I suppose it's possible that it missed
>> something.
>>
>>> Pass 1: Checking inodes, blocks, and sizes
>>> Pass 2: Checking directory structure
>>> Pass 3: Checking directory connectivity
>>> Pass 4: Checking reference counts
>>> Pass 5: Checking group summary information
>>> /dev/sda5: 24770/262144 files (0.1% non-contiguous), 328031/1048576 blocks
>>> ubuntu@mac0013950af6fb:~$
>>>
>>> I am not sure I see any errors. Is there an error here?
>>
>> No, that didn't report any errors.
>>
>> If you unmount it and do it w/o -n, it should clear the error state.
>> Perhaps it encountered an error for a file that got subsequently deleted,
>> or something - not sure.
>>
>
> That is true - we are able to fix this - almost trivially - the
> problem is that we are causing this frequently (sometimes always) with
> inexpensive SSDs. I am sure you have seen my other email:
> http://marc.info/?l=linux-ext4&m=137028288823079&w=2
>
> I assume that there is no other tool that I can use - (short of a hex
> dump of the 4.0G partition using dd) - to further debug this. Is
> there?

"with errors" is printed when the fs state has EXT2_ERROR_FS set.

Looking at the system logs would be a start - when the filesystem was set
into error state, the kernel should have logged that fact, along with
why it did so... you could go from there.

Also, kernels since 2.6.36 save more info:

commit 1c13d5c0872870cca3e612aa045d492ead9ab004
Author: Theodore Ts'o <[email protected]>
Date: Tue Jul 27 11:56:03 2010 -0400

ext4: Save error information to the superblock for analysis

Save number of file system errors, and the time function name, line
number, block number, and inode number of the first and most recent
errors reported on the file system in the superblock.

Signed-off-by: "Theodore Ts'o" <[email protected]>

This gets printed daily:

if (es->s_error_count)
ext4_msg(sb, KERN_NOTICE, "error count: %u",
le32_to_cpu(es->s_error_count));
if (es->s_first_error_time) {
printk(KERN_NOTICE "EXT4-fs (%s): initial error at %u: %.*s:%d",
sb->s_id, le32_to_cpu(es->s_first_error_time),
(int) sizeof(es->s_first_error_func),
es->s_first_error_func,
le32_to_cpu(es->s_first_error_line));

-Eric

2013-06-04 12:39:14

by Autif Khan

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On Mon, Jun 3, 2013 at 4:14 PM, Eric Sandeen <[email protected]> wrote:
> On 6/3/13 3:07 PM, Autif Khan wrote:
>> On Mon, Jun 3, 2013 at 3:34 PM, Eric Sandeen <[email protected]> wrote:
>>> On 6/3/13 2:29 PM, Autif Khan wrote:
>>>> On Mon, Jun 3, 2013 at 3:13 PM, Eric Sandeen <[email protected]> wrote:
>>>>> On 6/3/13 1:45 PM, Autif Khan wrote:
>>>>>> Executing dumpe2fs -h on one of the partitions says
>>>>>>
>>>>>> ...
>>>>>> Filesystem features: has_journal ext_attr resize_inode dir_index
>>>>>> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
>>>>>> dir_nlink extra_isize
>>>>>> Filesystem flags: signed_directory_hash
>>>>>> Default mount options: user_xattr acl
>>>>>> Filesystem state: clean with errors
>>>>>> ...
>>>>>>
>>>>>> How can I find out what the errors are - the details of the errors.
>>>>>
>>>>> "clean" means the log has been replayed (log is not dirty)
>>>>> "with errors" means that it encountered concistency errors at runtime
>>>>>
>>>>> run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
>>>>> dry run)
>>>>
>>>> --- spin ---
>>>>
>>>> ubuntu@mac0013950af6fb:~$ sudo fsck -V -n -f /dev/sda5
>>>> fsck from util-linux 2.20.1
>>>> [/sbin/fsck.ext4 (1) -- /koko] fsck.ext4 -n -f /dev/sda5
>>>> e2fsck 1.42 (29-Nov-2011)
>>>> Warning! /dev/sda5 is mounted.
>>>
>>> Surprising that it didn't find errors since you ran it on a mounted fs!
>>>
>>> That's also an older e2fsck, so I suppose it's possible that it missed
>>> something.
>>>
>>>> Pass 1: Checking inodes, blocks, and sizes
>>>> Pass 2: Checking directory structure
>>>> Pass 3: Checking directory connectivity
>>>> Pass 4: Checking reference counts
>>>> Pass 5: Checking group summary information
>>>> /dev/sda5: 24770/262144 files (0.1% non-contiguous), 328031/1048576 blocks
>>>> ubuntu@mac0013950af6fb:~$
>>>>
>>>> I am not sure I see any errors. Is there an error here?
>>>
>>> No, that didn't report any errors.
>>>
>>> If you unmount it and do it w/o -n, it should clear the error state.
>>> Perhaps it encountered an error for a file that got subsequently deleted,
>>> or something - not sure.
>>>
>>
>> That is true - we are able to fix this - almost trivially - the
>> problem is that we are causing this frequently (sometimes always) with
>> inexpensive SSDs. I am sure you have seen my other email:
>> http://marc.info/?l=linux-ext4&m=137028288823079&w=2
>>
>> I assume that there is no other tool that I can use - (short of a hex
>> dump of the 4.0G partition using dd) - to further debug this. Is
>> there?
>
> "with errors" is printed when the fs state has EXT2_ERROR_FS set.
>
> Looking at the system logs would be a start - when the filesystem was set
> into error state, the kernel should have logged that fact, along with
> why it did so... you could go from there.
>
> Also, kernels since 2.6.36 save more info:
>

We are at 3.2.33, so we should be good.

However, I do not see "with errors" in either dmesg or /var/log/syslog.

Is there a kernel config that needs to be set to enable EXT2/3/4 logging?

2013-06-04 12:41:54

by Autif Khan

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

On Tue, Jun 4, 2013 at 8:39 AM, Autif Khan <[email protected]> wrote:
> On Mon, Jun 3, 2013 at 4:14 PM, Eric Sandeen <[email protected]> wrote:
>> On 6/3/13 3:07 PM, Autif Khan wrote:
>>> On Mon, Jun 3, 2013 at 3:34 PM, Eric Sandeen <[email protected]> wrote:
>>>> On 6/3/13 2:29 PM, Autif Khan wrote:
>>>>> On Mon, Jun 3, 2013 at 3:13 PM, Eric Sandeen <[email protected]> wrote:
>>>>>> On 6/3/13 1:45 PM, Autif Khan wrote:
>>>>>>> Executing dumpe2fs -h on one of the partitions says
>>>>>>>
>>>>>>> ...
>>>>>>> Filesystem features: has_journal ext_attr resize_inode dir_index
>>>>>>> filetype extent flex_bg sparse_super large_file huge_file uninit_bg
>>>>>>> dir_nlink extra_isize
>>>>>>> Filesystem flags: signed_directory_hash
>>>>>>> Default mount options: user_xattr acl
>>>>>>> Filesystem state: clean with errors
>>>>>>> ...
>>>>>>>
>>>>>>> How can I find out what the errors are - the details of the errors.
>>>>>>
>>>>>> "clean" means the log has been replayed (log is not dirty)
>>>>>> "with errors" means that it encountered concistency errors at runtime
>>>>>>
>>>>>> run e2fsck -f on it to see what it finds (or e2fsck -fn if you want a no-op
>>>>>> dry run)
>>>>>
>>>>> --- spin ---
>>>>>
>>>>> ubuntu@mac0013950af6fb:~$ sudo fsck -V -n -f /dev/sda5
>>>>> fsck from util-linux 2.20.1
>>>>> [/sbin/fsck.ext4 (1) -- /koko] fsck.ext4 -n -f /dev/sda5
>>>>> e2fsck 1.42 (29-Nov-2011)
>>>>> Warning! /dev/sda5 is mounted.
>>>>
>>>> Surprising that it didn't find errors since you ran it on a mounted fs!
>>>>
>>>> That's also an older e2fsck, so I suppose it's possible that it missed
>>>> something.
>>>>
>>>>> Pass 1: Checking inodes, blocks, and sizes
>>>>> Pass 2: Checking directory structure
>>>>> Pass 3: Checking directory connectivity
>>>>> Pass 4: Checking reference counts
>>>>> Pass 5: Checking group summary information
>>>>> /dev/sda5: 24770/262144 files (0.1% non-contiguous), 328031/1048576 blocks
>>>>> ubuntu@mac0013950af6fb:~$
>>>>>
>>>>> I am not sure I see any errors. Is there an error here?
>>>>
>>>> No, that didn't report any errors.
>>>>
>>>> If you unmount it and do it w/o -n, it should clear the error state.
>>>> Perhaps it encountered an error for a file that got subsequently deleted,
>>>> or something - not sure.
>>>>
>>>
>>> That is true - we are able to fix this - almost trivially - the
>>> problem is that we are causing this frequently (sometimes always) with
>>> inexpensive SSDs. I am sure you have seen my other email:
>>> http://marc.info/?l=linux-ext4&m=137028288823079&w=2
>>>
>>> I assume that there is no other tool that I can use - (short of a hex
>>> dump of the 4.0G partition using dd) - to further debug this. Is
>>> there?
>>
>> "with errors" is printed when the fs state has EXT2_ERROR_FS set.
>>
>> Looking at the system logs would be a start - when the filesystem was set
>> into error state, the kernel should have logged that fact, along with
>> why it did so... you could go from there.
>>
>> Also, kernels since 2.6.36 save more info:
>>
>
> We are at 3.2.33, so we should be good.
>
> However, I do not see "with errors" in either dmesg or /var/log/syslog.
>
> Is there a kernel config that needs to be set to enable EXT2/3/4 logging?

I might have misunderstood - I should be searching for "error count"
and/or "initial error"

I can not find these either - So, is there a kernel config that needs to be set?

2013-06-04 13:49:44

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

Hmm... what version of e2fsprogs are you using? Is there any chance
it's older than 1.42.4? Hmmm, yes, you're using a positively ancient
(and filled with bugs that have since been fixed e2fsprogs 1.42).

I suspect you're getting hit bug a problem which we fixed in e2fsprogs
1.42.4 (and you *REALLY* want to upgrade to the latest released
version of e2fsprogs):

Fixed e2fsck's handling of the journal's s_errno field. E2fsck was
not properly propagating the journal's s_errno field to the superblock
field; it was not checking this field if the journal had already been
replayed, and if the journal *was* being replayed, the "error bit"
wasn't getting flushed out to disk.

The kernel side fix for this particular issue (if this is what is
going on) is:

commit d796c52ef0b71a988364f6109aeb63d79c5b116b
Author: Theodore Ts'o <[email protected]>
Date: Sun Aug 5 19:04:57 2012 -0400

ext4: make sure the journal sb is written in ext4_clear_journal_err()

After we transfer set the EXT4_ERROR_FS bit in the file system
superblock, it's not enough to call jbd2_journal_clear_err() to clear
the error indication from journal superblock --- we need to call
jbd2_journal_update_sb_errno() as well. Otherwise, when the root file
system is mounted read-only, the journal is replayed, and the error
indicator is transferred to the superblock --- but the s_errno field
in the jbd2 superblock is left set (since although we cleared it in
memory, we never flushed it out to disk).

This can end up confusing e2fsck. We should make e2fsck more robust
in this case, but the kernel shouldn't be leaving things in this
confused state, either.

Signed-off-by: "Theodore Ts'o" <[email protected]>
Cc: [email protected]

... which first appeared in the 3.6 kernel, and which for some reason
was never backported to the 3.2 stable series.

Regards,

- Ted

2013-06-20 23:01:55

by Autif Khan

[permalink] [raw]

Subject: Re: Filesystem state: clean with errors - what errors?

I am happy to report that upgrading from 1.42 to 1.42.7 has resolved
most of the issues. There is still one vendor where we are getting
corruption and we will avoid that vendor. We are small fish.

Thanks a lot to everyone that helped - specifically Eric, Ted and DJW

On Tue, Jun 4, 2013 at 9:49 AM, Theodore Ts'o <[email protected]> wrote:
> Hmm... what version of e2fsprogs are you using? Is there any chance
> it's older than 1.42.4? Hmmm, yes, you're using a positively ancient
> (and filled with bugs that have since been fixed e2fsprogs 1.42).
>
> I suspect you're getting hit bug a problem which we fixed in e2fsprogs
> 1.42.4 (and you *REALLY* want to upgrade to the latest released
> version of e2fsprogs):
>
> Fixed e2fsck's handling of the journal's s_errno field. E2fsck was
> not properly propagating the journal's s_errno field to the superblock
> field; it was not checking this field if the journal had already been
> replayed, and if the journal *was* being replayed, the "error bit"
> wasn't getting flushed out to disk.
>
> The kernel side fix for this particular issue (if this is what is
> going on) is:
>
> commit d796c52ef0b71a988364f6109aeb63d79c5b116b
> Author: Theodore Ts'o <[email protected]>
> Date: Sun Aug 5 19:04:57 2012 -0400
>
> ext4: make sure the journal sb is written in ext4_clear_journal_err()
>
> After we transfer set the EXT4_ERROR_FS bit in the file system
> superblock, it's not enough to call jbd2_journal_clear_err() to clear
> the error indication from journal superblock --- we need to call
> jbd2_journal_update_sb_errno() as well. Otherwise, when the root file
> system is mounted read-only, the journal is replayed, and the error
> indicator is transferred to the superblock --- but the s_errno field
> in the jbd2 superblock is left set (since although we cleared it in
> memory, we never flushed it out to disk).
>
> This can end up confusing e2fsck. We should make e2fsck more robust
> in this case, but the kernel shouldn't be leaving things in this
> confused state, either.
>
> Signed-off-by: "Theodore Ts'o" <[email protected]>
> Cc: [email protected]
>
> ... which first appeared in the 3.6 kernel, and which for some reason
> was never backported to the 3.2 stable series.
>
> Regards,
>
> - Ted