After a hard lockup and reboot, my test box (running recent Linus git
851b147) came up with:
[ 5.016854] EXT4-fs (sda1): mounted filesystem with ordered data mode
[ 8.809125] EXT4-fs (sda1): internal journal on sda1:8
[ 10.165239] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 524788
[ 10.165286] Aborting journal on device sda1:8.
[ 10.168111] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
[ 10.168169] EXT4-fs (sda1): Remounting filesystem read-only
[ 10.171614] EXT4-fs (sda1): Remounting filesystem read-only
Complete dmesg and kconfig attached. The lockup was running the same
kernel and X with a kernel "make -j4" on sda1 and a "dd if=/dev/zero
bs=1M of=/dev/sdb1" to a USB thumb drive running simultaneously; I'd
Ctrl-C'ed the dd about 30 seconds before the machine locked up (and the
dd was still in "D" so I'd attached an strace to it to see if anything
interesting would happen when it finally came back).
I haven't rebooted or remounted RW yet, happy to run whatever debug
commands would be useful.
Two issues here:
1. the lockup sucks, but I have no idea what to do about it. There's
nothing useful in the logs and it wasn't responding to SysRq and the
keyboard LEDs weren't blinking.
2. after a lockup the journal recovery should not fail.
-andy
On Mon, Sep 28, 2009 at 12:16:44PM -0700, Andy Isaacson wrote:
> After a hard lockup and reboot, my test box (running recent Linus git
> 851b147) came up with:
>
> [ 5.016854] EXT4-fs (sda1): mounted filesystem with ordered data mode
> [ 8.809125] EXT4-fs (sda1): internal journal on sda1:8
> [ 10.165239] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 524788
> [ 10.165286] Aborting journal on device sda1:8.
> [ 10.168111] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
> [ 10.168169] EXT4-fs (sda1): Remounting filesystem read-only
> [ 10.171614] EXT4-fs (sda1): Remounting filesystem read-only
It would be useful to see what pathname is associated with inode 524788.
You can use debugfs to find this out. For example to find a pathname
which points to inode 14666, you can do this:
# debugfs /dev/sda1
debugfs 1.41.9 (22-Aug-2009)
debugfs: ncheck 14666
Inode Pathname
14666 /grub/menu.lst
Also try using the debugfs stat command, send me the output, please:
debugfs: stat <14666>
> 2. after a lockup the journal recovery should not fail.
I'm not sure it was a matter of the journal recovery failing. All we
know for certain is that filesystem was corrupted after the lockup and
remounting the filesystem. What caused the file system corruption is
open to question at the moment; it could have been caused by the
lockup; or it could have been a file that was deleted right about the
time of the lockup; or it could have been some completely random
filesystem corruption that.
It would be useful to know whether the inode was in question was
supposed to have been deleted. If it was, it would be useful to know
if the dtime reported by debugfs's stat was around the time of the
original lockup.
- Ted
On Mon, Sep 28, 2009 at 04:25:07PM -0400, Theodore Tso wrote:
> On Mon, Sep 28, 2009 at 12:16:44PM -0700, Andy Isaacson wrote:
> > After a hard lockup and reboot, my test box (running recent Linus git
> > 851b147) came up with:
> >
> > [ 5.016854] EXT4-fs (sda1): mounted filesystem with ordered data mode
> > [ 8.809125] EXT4-fs (sda1): internal journal on sda1:8
> > [ 10.165239] EXT4-fs error (device sda1): ext4_lookup: deleted inode referenced: 524788
> > [ 10.165286] Aborting journal on device sda1:8.
> > [ 10.168111] EXT4-fs error (device sda1): ext4_journal_start_sb: Detected aborted journal
> > [ 10.168169] EXT4-fs (sda1): Remounting filesystem read-only
> > [ 10.171614] EXT4-fs (sda1): Remounting filesystem read-only
>
> It would be useful to see what pathname is associated with inode 524788.
debugfs: ncheck 524788
Inode Pathname
524788 /etc/rcS.d/S90mountdebugfs
debugfs: cd /etc/rcS.d
debugfs: ls
532482 (12) . 16386 (12) .. 526699 (16) README
524349 (28) S06keyboard-setup 524350 (24) S13pcmciautils
524351 (20) S25brltty 524788 (52) S90mountdebugfs
524354 (36) S37apparmor 524356 (24) S49console-setup
524357 (20) S55urandom 524358 (28) S70screen-cleanup
524359 (24) S70x11-common 524360 (3800) S75policykit
debugfs: stat S90mountdebugfs
Inode: 524788 Type: regular Mode: 0644 Flags: 0x80000
Generation: 1027808588 Version: 0x00000000:00000001
User: 0 Group: 0 Size: 0
File ACL: 0 Directory ACL: 0
Links: 0 Blockcount: 0
Fragment: Address: 0 Number: 0 Size: 0
ctime: 0x4ab94c5d:3dd78a78 -- Tue Sep 22 15:14:53 2009
atime: 0x4ab949a2:58503f24 -- Tue Sep 22 15:03:14 2009
mtime: 0x4ab94c5d:3dd78a78 -- Tue Sep 22 15:14:53 2009
crtime: 0x4ab7bf7b:2589fbd4 -- Mon Sep 21 11:01:31 2009
dtime: 0x4ab94c5d -- Tue Sep 22 15:14:53 2009
Size of extra inode fields: 28
EXTENTS:
(END)
> > 2. after a lockup the journal recovery should not fail.
>
> I'm not sure it was a matter of the journal recovery failing. All we
> know for certain is that filesystem was corrupted after the lockup and
> remounting the filesystem. What caused the file system corruption is
> open to question at the moment; it could have been caused by the
> lockup; or it could have been a file that was deleted right about the
> time of the lockup; or it could have been some completely random
> filesystem corruption that.
>
> It would be useful to know whether the inode was in question was
> supposed to have been deleted. If it was, it would be useful to know
> if the dtime reported by debugfs's stat was around the time of the
> original lockup.
/etc/init.d/mountdebugfs is also present in the directory but has dtime
set. mountdebugfs is part of blktrace, which I installed ("apt-get
install blktrace") at least 20 minutes prior to the lockup; I'd
run blktrace, gotten incomprehensible errors, figured out I needed
CONFIG_FTRACE and the relevant block IO tracing options to use blktrace,
reconfigured the kernel, and was rebuilding when it hung.
Hmmm, /var/log/dpkg.log has mtime Sep 23 18:45:48 and is missing today's
activity (blktrace and ncurses5-dev).
/var/cache/apt/archive/blktrace_1.0.1-2_amd64.deb also has dtime set.
fsck.ext4 has a lot to say -- looks like something bad happened Sep 22
and there's been corruption lurking since. I used the machine a fair
bit on 9/23 though (including installing some packages), and didn't see
any problems.
I've attached the complete output from "fsck -n /dev/sda1" and "stat
<%d>" on each inode reported to be deleted.
-andy
On Mon, Sep 28, 2009 at 02:28:38PM -0700, Andy Isaacson wrote:
>
> I've attached the complete output from "fsck -n /dev/sda1" and "stat
> <%d>" on each inode reported to be deleted.
>
So the large numbers of multiply-claimed blocks message is definitely
a clue:
> Multiply-claimed block(s) in inode 919422: 3704637
> Multiply-claimed block(s) in inode 928410: 3704637
> Multiply-claimed block(s) in inode 928622: 3703283
> Multiply-claimed block(s) in inode 943927: 3703283
> Multiply-claimed block(s) in inode 933307: 3702930
> Multiply-claimed block(s) in inode 943902: 3702930
What this indicates to me is that an inode table block was written to
the wrong location on disk. In fact, given large numbers of inode
numbers involved, it looks like large numbers of inode table blocks
were written to the wrong location on disk.
So what happend with the file "/etc/rcS.d/S90mountdebugfs" is probably
_not_ that it was deleted on September 22nd, but rather sometime
recently the inode table block containing to inode #524788 was
overwritten by another inode table block, containing a deleted inode
at that relative position in the inode table block.
This must have happened since the last successful boot, since with
/etc/rcS.d/S90mountdebugfs pointing at a deleted inode, any attempt to
boot the system after the corruption had taken place would have
resulted in catastrophe.
I'm surprised by how many inode tables blocks apparently had gotten
mis-directed. Almost certainly there must have been some kind of
hardware failure that must have triggered this. I'm not sure what
caused it, but it does seem like your filesystem has been toasted
fairly badly.
At this point my advice to you would be to try to recover as much data
from the disk as you can, and to *not* try to run fsck or mount the
filesystem read/write until you are confident you have recovered all
of the critical files you care about, or have made a image copy of the
disk using dd to a backup hard drive first. If you're really curious
we could try to look at the dumpe2fs output and see if we can find the
pattern of what might have caused so many misdirected writes, but
there's no guarantee that we would be able to find the definitive root
cause, and from a recovery perspective, it's probably faster and less
risk to reinstall your system disk from scratch.
Good luck, and I'm sorry your file system had gotten so badly
disrupted.
- Ted
On Mon, Sep 28, 2009 at 11:13:08PM -0400, Theodore Tso wrote:
> What this indicates to me is that an inode table block was written to
> the wrong location on disk. In fact, given large numbers of inode
> numbers involved, it looks like large numbers of inode table blocks
> were written to the wrong location on disk.
Aha, sounds like an excellent theory.
> I'm surprised by how many inode tables blocks apparently had gotten
> mis-directed. Almost certainly there must have been some kind of
> hardware failure that must have triggered this. I'm not sure what
> caused it, but it does seem like your filesystem has been toasted
> fairly badly.
As I said, the machine hung hard while doing a bunch of writes to a USB
thumbdrive and a kernel compile on sda1. It could be hardware, but I've
been using this laptop as my primary test box for several months and
it's been fairly reliable (as reliable as git-of-the-day is, pretty
much).
I'll run memtest86 and check SMART.
Note that it is running DMAR (the Intel VT-d iommu implementation), it
could be that a DMA got messed up -- since the logs didn't make it I
don't know if DMAR reported any DMA protection faults at the time of
failure. The DMAR on this box has had some issues in the past which
seem to be fixed, but ...
> At this point my advice to you would be to try to recover as much data
> from the disk as you can, and to *not* try to run fsck or mount the
Oh, all the data is well backed-up; this is a seriously bleeding-edge
box.
I've taken a complete image of /dev/sda1 and will be reinstalling it.
The image is from after the kernel remounted / RO.
> disk using dd to a backup hard drive first. If you're really curious
> we could try to look at the dumpe2fs output and see if we can find the
> pattern of what might have caused so many misdirected writes, but
> there's no guarantee that we would be able to find the definitive root
> cause, and from a recovery perspective, it's probably faster and less
> risk to reinstall your system disk from scratch.
I would like to get as close to root cause as possible. I have a
filesystem image copied away and I'll be attempting to repro the
failure; this is a test system for a large deployment, so I don't want
any issues lurking. :)
Let me know what debug commands you'd like to run. dumpe2fs output is
at http://web.hexapodia.org/~adi/tmp/dumpe2fs.out
-andy
On Tue, Sep 29, 2009 at 09:12:50AM -0700, Andy Isaacson wrote:
> I'll run memtest86 and check SMART.
The host ran memtest86+ for 8 hours with no errors and the disk isn't
reporting any problems via SMART.
> I would like to get as close to root cause as possible. I have a
> filesystem image copied away and I'll be attempting to repro the
> failure; this is a test system for a large deployment, so I don't want
> any issues lurking. :)
>
> Let me know what debug commands you'd like to run. dumpe2fs output is
> at http://web.hexapodia.org/~adi/tmp/dumpe2fs.out
I'm going to try to reproduce the failure today. To my untutored eye
the dumpe2fs output looks OK around group 48.
Thanks for any further suggestions,
-andy