2011-10-19 16:02:29

by Johannes Segitz

[permalink] [raw]
Subject: fsck.ext4 taking a very long time because of "should not have EOFBLOCKS_FL set"

Hello,

yesterday i was forced to start a fsck of an ext4 filesystem (4 TB on
a encrypted raid5 array). After a while a got a lot
of those messages:
Inode 23565579 should not have EOFBLOCKS_FL set (size 0, lblk -1)

After some googling i found this thread
http://kerneltrap.org/mailarchive/linux-ext4/2010/8/19/6885408/thread#mid-6885408

Since it's something that can be taken care of by using "-p" i started
it yesterday and was kind of surprised
to discover it running happily today with no sign of stopping. I piped
the output to /dev/null since the printing
of the messages alone caused quite a bit of load so i don't know at
which inode fsck currently is.

Is there a way to speed things up? If i understand the thread
correctly those errors should self correct over time
and i don't want to wait anymore. Can i do any harm by killing fsck
and start it again without the pipe to see
at which inode it currently is?

Bye,
Johannes


2011-10-19 16:21:24

by Andreas Dilger

[permalink] [raw]
Subject: Re: fsck.ext4 taking a very long time because of "should not have EOFBLOCKS_FL set"

On 2011-10-19, at 10:02 AM, Johannes Segitz <[email protected]> wrote:

> yesterday i was forced to start a fsck of an ext4 filesystem (4 TB on
> a encrypted raid5 array). After a while a got a lot
> of those messages:
> Inode 23565579 should not have EOFBLOCKS_FL set (size 0, lblk -1)
>
> After some googling i found this thread
> http://kerneltrap.org/mailarchive/linux-ext4/2010/8/19/6885408/thread#mid-6885408
>
> Since it's something that can be taken care of by using "-p" i started
> it yesterday and was kind of surprised
> to discover it running happily today with no sign of stopping. I piped
> the output to /dev/null since the printing
> of the messages alone caused quite a bit of load so i don't know at
> which inode fsck currently is.
>
> Is there a way to speed things up? If i understand the thread
> correctly those errors should self correct over time
> and i don't want to wait anymore. Can i do any harm by killing fsck
> and start it again without the pipe to see
> at which inode it currently is?

You could always strace e2fsck to see what it is printing.

Cheers, Andreas

2011-10-19 18:53:48

by Theodore Ts'o

[permalink] [raw]
Subject: Re: fsck.ext4 taking a very long time because of "should not have EOFBLOCKS_FL set"

On Wed, Oct 19, 2011 at 06:02:12PM +0200, Johannes Segitz wrote:
> Hello,
>
> yesterday i was forced to start a fsck of an ext4 filesystem (4 TB on
> a encrypted raid5 array). After a while a got a lot
> of those messages:
> Inode 23565579 should not have EOFBLOCKS_FL set (size 0, lblk -1)
>
> After some googling i found this thread
> http://kerneltrap.org/mailarchive/linux-ext4/2010/8/19/6885408/thread#mid-6885408

What kernel version are you using, and can you upgrade to one that has
this bug fixed? This is a problem which was fixed over a year ago...

> Since it's something that can be taken care of by using "-p" i started
> it yesterday and was kind of surprised
> to discover it running happily today with no sign of stopping. I piped
> the output to /dev/null since the printing
> of the messages alone caused quite a bit of load so i don't know at
> which inode fsck currently is.
>
> Is there a way to speed things up? If i understand the thread
> correctly those errors should self correct over time
> and i don't want to wait anymore. Can i do any harm by killing fsck
> and start it again without the pipe to see
> at which inode it currently is?

What version of e2fsprogs are you using? Given that you're using an
old version of the kernel there's a good chance you're using a old
version of e2fsprogs. Are you willing to upgrade to a newer kernel
and e2fsprogs? If so, the following procedure documented in the
following commit, which is included in e2fsprogs 1.41.13 or newer,
should help you out (see below).

- Ted

commit 75990388365c5688dbade9c33a3394e40f757526
Author: Theodore Ts'o <[email protected]>
Date: Mon Dec 6 10:10:33 2010 -0500

e2fsck: Add the ability to force a problem to not be fixed

The boolean options "force_no" in the problems stanza of e2fsck.conf
allows a particular problem code be treated as if the user will answer
"no" to the question of whether a particular problem should be fixed
--- even if e2fsck is run with the -y option.

As an example use case, suppose a distribution had widely deployed a
version of the kernel where under some circumstances, the EOFBLOCKS_FL
flag would be left set even though it should not be left set, and a
customer had a workload which exercised the fencepost error all the
time, resulting in many large number of inodes that had EOFBLOCKS_FL
set erroneously. Enough, in fact, the e2fsck runs were taking too
long. (There was such a bug in the kernel, which was fixed by commit
58590b06d in 2.6.36).

Leaving EOFBLOCKS_FL set when it should not be isn't a huge deal, and
is certainly than having high availability timeout alerts going off
left and right. So in this case, the best fix might be to put the
following in /etc/e2fsck.conf:

[problems]
0x010060 = { # PR_1_EOFBLOCKS_FL_SET
force_no = true
no_ok = true
no_nomsg = true
}

Signed-off-by: "Theodore Ts'o" <[email protected]>


2011-10-20 07:49:08

by Johannes Segitz

[permalink] [raw]
Subject: Re: fsck.ext4 taking a very long time because of "should not have EOFBLOCKS_FL set"

On Wed, Oct 19, 2011 at 18:22, Andreas Dilger <[email protected]> wrote:
> You could always strace e2fsck to see what it is printing.

i tried that put i can't see which inode is currently processed

<snip fcntl lines>
fcntl(5, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=556, len=1}) = 0
fcntl(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=284, len=1}) = 0
fcntl(5, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=164, len=1}) = 0
fstat(5, {st_mode=S_IFREG|0600, st_size=376758272, ...}) = 0
munmap(0x7fe2aee4e000, 376758272) = 0
ftruncate(5, 376762368) = 0
pwrite(5, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"..., 1024, 376758272) = 1024
pwrite(5, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"..., 1024, 376759296) = 1024
pwrite(5, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"..., 1024, 376760320) = 1024
pwrite(5, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"..., 1024, 376761344) = 1024
mmap(NULL, 376762368, PROT_READ|PROT_WRITE, MAP_SHARED, 5, 0) = 0x7fe2aee4d000

rinse repeat

Bye,
Johannes

2011-10-20 10:59:08

by Johannes Segitz

[permalink] [raw]
Subject: Re: fsck.ext4 taking a very long time because of "should not have EOFBLOCKS_FL set"

On Wed, Oct 19, 2011 at 20:53, Ted Ts'o <[email protected]> wrote:
> On Wed, Oct 19, 2011 at 06:02:12PM +0200, Johannes Segitz wrote:
> What kernel version are you using, and can you upgrade to one that has
> this bug fixed? ?This is a problem which was fixed over a year ago...

2.6.38-11-generic #50-Ubuntu SMP

I was running 3.0.4 until a few days ago.

I didn't fsck the filesystem for quite a while and the files on this
volume don't get
rewritten so it doesn't fix itself so i think it's just something that
was caused some
time ago and still persists

> What version of e2fsprogs are you using?

1.41.14-1ubuntu3 which seems to be the newest version

> ? ?As an example use case, suppose a distribution had widely deployed a
> ? ?version of the kernel where under some circumstances, the EOFBLOCKS_FL
> ? ?flag would be left set even though it should not be left set, and a
> ? ?customer had a workload which exercised the fencepost error all the
> ? ?time, resulting in many large number of inodes that had EOFBLOCKS_FL
> ? ?set erroneously.

yeah "suppose" ;)

> ? ?Leaving EOFBLOCKS_FL set when it should not be isn't a huge deal, and
> ? ?is certainly than having high availability timeout alerts going off
> ? ?left and right. ?So in this case, the best fix might be to put the
> ? ?following in /etc/e2fsck.conf:
>
> ? ?[problems]
> ? ?0x010060 = { ? ? ? ? ? ? ? ? ? ? ? ?# PR_1_EOFBLOCKS_FL_SET
> ? ? ? ? force_no = true
> ? ? ? ? no_ok = true
> ? ? ? ? no_nomsg = true
> ? ?}

That was pretty much what i was looking for, thank you. I'll kill fsck
tonight if it's still
running and run it again with those settings.

Thank you for your help
Johannes

2011-10-20 18:59:59

by Andreas Dilger

[permalink] [raw]
Subject: Re: fsck.ext4 taking a very long time because of "should not have EOFBLOCKS_FL set"

On 2011-10-20, at 4:58 AM, Johannes Segitz wrote:
> On Wed, Oct 19, 2011 at 20:53, Ted Ts'o <[email protected]> wrote:
>> Leaving EOFBLOCKS_FL set when it should not be isn't a huge deal, and
>> is certainly than having high availability timeout alerts going off
>> left and right. So in this case, the best fix might be to put the
>> following in /etc/e2fsck.conf:
>>
>> [problems]
>> 0x010060 = { # PR_1_EOFBLOCKS_FL_SET
>> force_no = true
>> no_ok = true
>> no_nomsg = true
>> }
>
> That was pretty much what i was looking for, thank you. I'll kill fsck
> tonight if it's still running and run it again with those settings.

At least it should have fixed the inodes that it has already scanned
on disk, so the next time you get a chance to run it without the above
[problems] option, it should be able to continue from where it left off.

Cheers, Andreas