2008-11-14 01:05:09

by Kelly Kane

[permalink] [raw]
Subject: Call traces on console from a test machine

We have a production (yay!) ext4 server which has started spewing
ext4_da_writepages errors on the console. The only change anyone can
think of is that we started doing rsync backups of the machine to
another. Perhaps this heavy I/O on user home directories is causing the
problem?

> avg-cpu: %user %nice %system %iowait %steal %idle
> 51.46 20.91 19.90 0.63 0.00 7.10
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
> sda 0.00 165.48 0.00 40.61 0.00 1648.73 40.60 1.28 31.45 0.90 3.65
> sda1 0.00 165.48 0.00 40.61 0.00 1648.73 40.60 1.28 31.45 0.90 3.65
> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> sdb 0.51 0.00 31.98 0.00 795.94 0.00 24.89 0.27 8.76 6.73 21.52
> sdb1 0.51 0.00 31.98 0.00 795.94 0.00 24.89 0.27 8.76 6.73 21.52

The errors scrolling by pretty quickly on the serial console:

ext4_da_writepages: jbd2_start: 1024 pages, ino 3014931; err -30
Pid: 284, comm: pdflush Tainted: G W
2.6.27-serf-xeon-c6.1-ext4-grsec #1

Call Trace:
[<ffffffff8031d485>] ext4_da_writepages+0x2f5/0x320
[<ffffffff80227cc5>] __dequeue_entity+0x55/0x80
[<ffffffff80227d15>] set_next_entity+0x25/0x50
[<ffffffff8026f570>] do_writepages+0x20/0x40
[<ffffffff802b3717>] __writeback_single_inode+0x97/0x340
[<ffffffff8022787f>] update_curr+0x3f/0x60
[<ffffffff80227cc5>] __dequeue_entity+0x55/0x80
[<ffffffff802b3e17>] generic_sync_sb_inodes+0x217/0x320
[<ffffffff802b42ce>] writeback_inodes+0x7e/0xc0
[<ffffffff8026ffc6>] wb_kupdate+0xa6/0x120
[<ffffffff802704a0>] pdflush+0x0/0x220
[<ffffffff802704a0>] pdflush+0x0/0x220
[<ffffffff802705de>] pdflush+0x13e/0x220
[<ffffffff8026ff20>] wb_kupdate+0x0/0x120
[<ffffffff80246b6b>] kthread+0x4b/0x80
[<ffffffff80203789>] child_rip+0xa/0x11
[<ffffffff80246b20>] kthread+0x0/0x80
[<ffffffff8020377f>] child_rip+0x0/0x11

This is a vanilla 2.6.27 kernel + grsec + "2.6.27-ext4-2" patchset + the
following patch per Sandeen:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3c37fc86d20fe35be656f070997d62f75c2e4874;hp=8c9fa93d51123c5540762b1a9e1919d6f9c4af7c

Unfortunately I do not have a reproducible, yet, and the kernel is
monolithic. It hasn't been rebooted (yet!) so I can gather something
from the memory. If it crashes or proves unusable, though, I will have
to reboot it.

We also switched the fstab, but no one remembers remounting the
filesystem to be as follows:

/dev/sdb1 /home ext4
defaults,noatime,nodiratime,nosuid,nodev,errors=remount-ro,data=writeback
0 0

Prior it had no "data=" section.

Kelly


2008-11-14 02:17:27

by Kelly Kane

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

Kelly Kane wrote:
>
> This is a vanilla 2.6.27 kernel + grsec + "2.6.27-ext4-2" patchset + the
> following patch per Sandeen:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3c37fc86d20fe35be656f070997d62f75c2e4874;hp=8c9fa93d51123c5540762b1a9e1919d6f9c4af7c
>

I applied patch:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8c3f25d8950c3e9fe6c9849f88679b3f2a071550#patch1

and built a new kernel.

In the mean time, fsck.ext4 is not liking our filesystem.

Group descriptor 353 checksum is invalid. Fix? yes
Pass 2: CheckingEntry 'spice-and-wolf-si-tendra-segunda-temporada' in

/liamsito1234/cibercomics.com/wp-content/cache/supercache/http://www.cibercomics.com/2008/10/05
(2152305) references inode
2901322 in group 354 where _INODE_UNINIT is set.
Entry 'spice-and-wolf-si-tendra-segunda-temporada' in

/liamsito1234/cibercomics.com/wp-content/cache/supercache/http://www.cibercomics.com/2008/10/05
(2152305) has
deleted/unused inode 2901322. Clear? yes
/dev/sdb1: e2fsck canceled.
e2fsck: aborted

Same block group each run through, different inode.

I ran e2image and produced the following file:

http://ext4dev.dreamhosters.com/watanabe-home-2008-11-13-1712.e2image.bz2

Unfortunately I cannot mount the filesystem right now due to:

EXT4-fs: ext4_check_descriptors: Checksum for group 353 failed
(23397!=52411)
EXT4-fs: group descriptors corrupted!
mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
missing codepage or other error In some cases useful info
is found in syslog - try
dmesg | tail or so

Is there any hope for recovery of any of this data?


Also, to append my prior email, this is the original error from the machine:

Nov 13 15:58:49 watanabe EXT4-fs error (device sdb1):
htree_dirblock_to_tree: bad entry in directory #51577: rec_len % 4 != 0
- offset=0, inode=943022112, rec_len=13614, name_len=52
Nov 13 15:58:49 watanabe Aborting journal on device sdb1:8.
Nov 13 15:58:49 watanabe Remounting filesystem read-only


Kelly


2008-11-14 10:36:05

by Andreas Dilger

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

On Nov 13, 2008 18:16 -0800, Kelly Kane wrote:
> In the mean time, fsck.ext4 is not liking our filesystem.
>
> Group descriptor 353 checksum is invalid. Fix? yes
> Pass 2: CheckingEntry 'spice-and-wolf-si-tendra-segunda-temporada' in
>

> /liamsito1234/cibercomics.com/wp-content/cache/supercache/http://www.cibercomics.com/2008/10/05
> (2152305) references inode
> 2901322 in group 354 where _INODE_UNINIT is set.
> Entry 'spice-and-wolf-si-tendra-segunda-temporada' in
>

> /liamsito1234/cibercomics.com/wp-content/cache/supercache/http://www.cibercomics.com/2008/10/05
> (2152305) has
> deleted/unused inode 2901322. Clear? yes
> /dev/sdb1: e2fsck canceled.
> e2fsck: aborted
>
> Same block group each run through, different inode.
>
> I ran e2image and produced the following file:
>
> http://ext4dev.dreamhosters.com/watanabe-home-2008-11-13-1712.e2image.bz2
>
> Unfortunately I cannot mount the filesystem right now due to:
>
> EXT4-fs: ext4_check_descriptors: Checksum for group 353 failed
> (23397!=52411)
> EXT4-fs: group descriptors corrupted!
> mount: wrong fs type, bad option, bad superblock on /dev/sdb1,
> missing codepage or other error In some cases useful info
> is found in syslog - try
> dmesg | tail or so
>
> Is there any hope for recovery of any of this data?

You can run "tune2fs -O ^uninit_bg /dev/XXX" to disable the group
descriptor checksums (and, unfortunately, the fast fsck benefits),
but at least it will allow you to mount the filesystem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


2008-11-14 17:09:34

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

On Thu, Nov 13, 2008 at 06:16:48PM -0800, Kelly Kane wrote:
> /liamsito1234/cibercomics.com/wp-content/cache/supercache/http://www.cibercomics.com/2008/10/05
> (2152305) has
> deleted/unused inode 2901322. Clear? yes
> /dev/sdb1: e2fsck canceled.
> e2fsck: aborted
>
> Same block group each run through, different inode.

That's very strange... and it's also strange that it would be
constantly changing. What version of e2fsprogs are you using?

> I ran e2image and produced the following file:
>
> http://ext4dev.dreamhosters.com/watanabe-home-2008-11-13-1712.e2image.bz2

Unfortunately, I need a raw image dump in order to be able to debug
this. Can you create it using:

e2image -r /dev/sdXX - | bzip2 > /tmp/sdXX.e2i.bz2

... and send it to me?

What's the underlying hardware for this filesystem, and I hate to ask
it, but are you sure it's not a hardware problem?

One thing you can do is to use e2image to create a sparse file on
another filesystem:

e2image -r /dev/sdXX /u2/sdXX.e2i

And then try to use e2fsck to check the raw e2image file. If it is
successful, then but yet you keep getting failures with different
block groups and different inodes, it may be a hardware problem.

Regards,

- Ted

2008-11-14 20:30:05

by Kelly Kane

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

Andreas Dilger wrote:

>
> You can run "tune2fs -O ^uninit_bg /dev/XXX" to disable the group
> descriptor checksums (and, unfortunately, the fast fsck benefits),
> but at least it will allow you to mount the filesystem.

Thank you, we'll keep that in mind in case there is a next time. (Knock
on wood.)

Theodore Tso wrote:
> On Thu, Nov 13, 2008 at 06:16:48PM -0800, Kelly Kane wrote:
>> Same block group each run through, different inode.
>
> That's very strange... and it's also strange that it would be
> constantly changing. What version of e2fsprogs are you using?

It eventually started picking the same file each time. After about 5
runs through I thought I saw a trend. We are using e2fsprogs 1.41.3
compiled from source.

>> I ran e2image and produced the following file:
>>
>> http://ext4dev.dreamhosters.com/watanabe-home-2008-11-13-1712.e2image.bz2
>
> Unfortunately, I need a raw image dump in order to be able to debug
> this. Can you create it using:
>
> e2image -r /dev/sdXX - | bzip2 > /tmp/sdXX.e2i.bz2
>
> ... and send it to me?

Sure. Once we finish restoring the data off the old partition I can do
this. If that file is of no use I will delete the source files. The
filesystem has been more heavily modified since the above image was
created, details below.

> What's the underlying hardware for this filesystem, and I hate to ask
> it, but are you sure it's not a hardware problem?
>
> One thing you can do is to use e2image to create a sparse file on
> another filesystem:
>
> e2image -r /dev/sdXX /u2/sdXX.e2i

We have of course not ruled anything out. We will put the hardware
through its paces before wiping it. I'll give this a go later as well.

The server is an intel xeon server on a supermicro board with a 3ware
9690SE-8LPML w/ BBU, using Seagate 1tb 7200.11 disks. (ST31000340AS) We
have an identical test server setup called `ext4dev` for use in
troubleshooting this stuff. Sandeen has direct access to this machine,
more accounts can be provided by him or myself via private email. It has
a copy of our kernel source tree, sudo access, etc.

I eventually got the filesystem mounted by answering "yes" to fix the
block group checksum error, "no" to all of the Pass 2 inode errors, and
"yes" to the pass 5 "fix bitmap inconsistency" (or something) error. It
does not cleanly fsck, still, but I can read data off it.

Thanks again,
Kelly

2008-11-17 18:58:43

by Kelly Kane

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

Theodore Tso wrote:
> e2image -r /dev/sdXX - | bzip2 > /tmp/sdXX.e2i.bz2
>
> ... and send it to me?

http://ext4dev.dreamhosters.com/sdb1.e2i.bz2

The above is the e2image requested, though it has been fsck'd a handful of
times as detailed in the other post. The main change is I got fsck.ext4 to
correct the block group checksum error. The pass 2 errors still make fsck
exit early claiming to be aborted.

Kelly

2008-11-17 20:31:53

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

On Mon, Nov 17, 2008 at 10:58:42AM -0800, Kelly Kane wrote:
> Theodore Tso wrote:
>> e2image -r /dev/sdXX - | bzip2 > /tmp/sdXX.e2i.bz2
>>
>> ... and send it to me?
>
> http://ext4dev.dreamhosters.com/sdb1.e2i.bz2

Yikes, it is big. Thanks.... it may take me a few days to find a
place to be able to store it and uncompress it, especially since I'm
travelling at the moment. I do really appreciate your creating the
raw e2image file. This should allow me to track down this problem:

> The pass 2 errors still make fsck exit early claiming to be aborted.

Thanks again!!!

- Ted

2008-11-17 21:20:29

by Eric Sandeen

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

Theodore Tso wrote:
> On Mon, Nov 17, 2008 at 10:58:42AM -0800, Kelly Kane wrote:
>> Theodore Tso wrote:
>>> e2image -r /dev/sdXX - | bzip2 > /tmp/sdXX.e2i.bz2
>>>
>>> ... and send it to me?
>> http://ext4dev.dreamhosters.com/sdb1.e2i.bz2
>
> Yikes, it is big. Thanks.... it may take me a few days to find a
> place to be able to store it and uncompress it, especially since I'm
> travelling at the moment. I do really appreciate your creating the
> raw e2image file. This should allow me to track down this problem:
>
>> The pass 2 errors still make fsck exit early claiming to be aborted.
>
> Thanks again!!!

Ted, what do you think about putting make-sparse.c into e2fsprogs?
Might be handy for this sort of thing, rather than searching the
innertubes for it each time I need it ;)

(Now, we'll see if ext4 can cope with the bazillions of extents this
will create...)

-Eric

> - Ted


2008-11-18 02:26:25

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Call traces on console from a test machine

On Mon, Nov 17, 2008 at 03:20:04PM -0600, Eric Sandeen wrote:
>
> Ted, what do you think about putting make-sparse.c into e2fsprogs?
> Might be handy for this sort of thing, rather than searching the
> innertubes for it each time I need it ;)

I've checked it into the e2fsprogs git tree, in the contrib directory.

Long-term, what I'm thinking we need to do is to steal qemu's block
device driver files (using a scheme similar to what I did with tdb),
and create a single an I/O manager function which supports qemu qcow2
volumes. This could be used as a much more efficient way of storing
raw e2image files (since it doesn't require sparse file support as
well as supporting in-line zlib compression), and also as a much more
efficient way of supporting "undo" functionality (by storing the
original version of the file as a cow snapshot). If we were to
implement it as an I/O manager, then we could even run fsck on virtual
filesystems used by KVM directly from the host OS (since KVM uses qemu
and therefore the qemu block device drivers as well).

If some Quarumnet^H^H^H^H^H^H^H^H^H Red Hat engineers has some free
time, that would be a great project for someone looking to get
involved in a filesystme project. Otherwise, it's on my TODO queue,
and I'll get to it one of these days....

- Ted