2012-10-26 23:15:52

by Martin

[permalink] [raw]
Subject: ext4 issue after unclean shutdown and reboot

Hi,

sorry for the repetition, but Theodore Ts'o asked me to re-post this as
a new mail thread in order to keep track of this more easily.

cu Martin


------------------ snip ---------------------------

Storyboard for my root filesystem crash (source: system logs and memory)
========================================================================

Oct 13 07:48:15

Computer is booted. Computer is then suspended and resumed a few times.


Oct 15 18:43:19

Final resume event before the issue starts. At some point prior to the
next timestamp the computer freezes. Probably just the video hardware
becoming unresponsive, but the teenage user does not know about ssh or
alt-sysreq and decides to turn the killswitch.

He then remembers he is supposed to do a clean shutdown at all times and
boots the computer again in order to perform a clean shutdown:


Oct 15 19:04:20

Computer is booted for the sole reason to be shut down again.


Oct 15 19:05:15

Computer halts. Nothing unusual in the system logs.


Oct 15 19:56:12

Computer is booted again in order to copy a few files to memory stick.
Unbeknownst to me, the following entries are logged in the system log:

Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5):
add_dirent_to_buf:1587: inode #655361: block 2629945: comm mount: bad
entry in directory: rec_len % 4 != 0 - offset=360(360), inode=655682,
rec_len=18, name_len=5
Oct 15 20:00:16 harold kernel: Aborting journal on device sda5-8.
Oct 15 20:00:16 harold kernel: EXT4-fs (sda5): Remounting filesystem
read-only
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in
ext4_evict_inode:238: Journal has aborted
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5) in
ext4_create:2120: IO failure
Oct 15 20:00:16 harold hp-systray: hp-systray[1594]: warning: No hp: or
hpfax: devices found in any installed CUPS queue. Exiting.
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5):
search_dirblock:1098: inode #655361: block 2629945: comm
dbus-daemon-lau: bad entry in directory: rec_len % 4 != 0 -
offset=360(8552), inode=655682, rec_len=18, name_len=5
Oct 15 20:00:16 harold kernel: EXT4-fs error (device sda5):
search_dirblock:1098: inode #655361: block 2629945: comm
dbus-daemon-lau: bad entry in directory: rec_len % 4 != 0 -
offset=360(8552), inode=655682, rec_len=18, name_len=5
Oct 15 20:01:06 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'video' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:01:06 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] No Caching mode page
present
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] Assuming drive cache:
write through
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] No Caching mode page
present
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] Assuming drive cache:
write through
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] No Caching mode page
present
Oct 15 20:01:07 harold kernel: sd 8:0:0:0: >[sdc] Assuming drive cache:
write through
Oct 15 20:01:19 harold udisksd[1710]: Mounted /dev/sdc1 at
/run/media/jan/INTENSO on behalf of uid 1002
Oct 15 20:01:21 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm pool: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:01:59 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'video' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:01:59 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'video' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:02:17 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'video' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:02:20 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:04:54 harold udisksd[1710]: Mounted /dev/sr0 at
/run/media/jan/Sixth Sense on behalf of uid 1002
Oct 15 20:05:31 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm bash: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:05:38 harold last message repeated 16 times
Oct 15 20:06:00 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm ls: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:06:43 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm ls: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:06:59 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm mc: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:06:59 harold kernel: EXT4-fs error (device sda5):
htree_dirblock_to_tree:861: inode #655361: block 2629945: comm mc: bad
entry in directory: rec_len % 4 != 0 - offset=360(8552), inode=655682,
rec_len=18, name_len=5
Oct 15 20:07:13 harold kernel: nf_conntrack: automatic helper assignment
is deprecated and it will be removed soon. Use the iptables CT target to
attach helpers instead.


Since the memory stick couldn't be accessed I had decided at this point
to look into /etc/groups which is a common cause for this. The file or
directory entry was missing. Further messages while unmounting:


Oct 15 20:09:25 harold kernel: usb 1-1: >USB disconnect, device number 2
Oct 15 20:09:25 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'video' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:09:26 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:09:26 harold udisksd[1710]: Cleaning up mount point
/run/media/jan/INTENSO (device 8:33 no longer exist)
Oct 15 20:09:41 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'video' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:09:41 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:09:41 harold kernel: NVRM: GPU at 0000:02:00:
GPU-918d6d1b-c9fb-6232-601c-99ea6ba768a7
Oct 15 20:09:41 harold acpid: client 1512[0:0] has disconnected
Oct 15 20:09:41 harold acpid: client 1512[0:0] has disconnected
Oct 15 20:09:41 harold acpid: client connected from 2241[0:0]
Oct 15 20:09:41 harold acpid: 1 client rule loaded
Oct 15 20:09:42 harold acpid: client connected from 2241[0:0]
Oct 15 20:09:42 harold acpid: 1 client rule loaded
Oct 15 20:09:42 harold kdm_greet[2245]: Cannot load
/usr/share/apps/kdm/faces/.default.face: Datei oder Verzeichnis nicht
gefunden
Oct 15 20:09:44 harold acpid: client 2241[0:0] has disconnected
Oct 15 20:09:47 harold acpid: client 2241[0:0] has disconnected


At this point I went into runlevel 1 to troubleshoot, but then decided
to reboot and do an fsck from an initrd environment.


Oct 15 20:09:55 harold init: Switching to runlevel: 1
Oct 15 20:09:55 harold udevd[955]: specified group 'plugdev' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'netdev' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'tty' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'dialout' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'kmem' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'video' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'audio' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'lp' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'disk' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'floppy' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'cdrom' unknown
Oct 15 20:09:55 harold udevd[955]: specified group 'tape' unknown
Oct 15 20:09:58 harold squid: getgrnam failed to find groupid for
effective group 'nogroup'
Oct 15 20:09:58 harold squid: getgrnam failed to find groupid for
effective group 'nogroup'
Oct 15 20:09:58 harold (squid-1): getgrnam failed to find groupid for
effective group 'nogroup'
Oct 15 20:09:58 harold acpid: exiting
Oct 15 20:09:58 harold syslogd 1.5.0: restart.
Oct 15 20:09:58 harold kernel: Kernel logging (proc) stopped.
Oct 15 20:09:58 harold kernel: Kernel log daemon terminating.
Oct 15 20:09:58 harold sshd[1307]: Received SIGHUP; restarting.
Oct 15 20:09:58 harold ntpd[1314]: ntpd exiting on signal 1
Oct 15 20:09:58 harold dbus[1333]: [system] Reloaded configuration
Oct 15 20:09:58 harold sshd[2313]: Server listening on 0.0.0.0 port 22.
Oct 15 20:10:04 harold exiting on signal 15


The fsck fixed some issues and testified a clean state.

The next reboot did not even get logged in the system logs, and logon
was refused.

I have now looked at the filesystem again from an initrd environment.
Result: lost+found contains 577 files, 90 of which are directories. All
of them seem to be from /etc. I will try to rename them to their proper
name on another machine, and restore them on the target machine.
However, due to the sheer number this might take forever. Also I am
worried the problem might re-surface, as it has neither been identified
nor fixed.

Full system logs are available on request.

NB: kernel was v3.5.5 with CK1 and BFQ patches, tainted by nvidia
module. The rootfs was mounted as type ext4 with mount options
discard,noatime,nodiratime,errors=remount-ro



2012-10-27 00:07:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: ext4 issue after unclean shutdown and reboot

I've bcc'ed [email protected], LKML, and greg-kh, since I suspect
they aren't interested in all of these details... we'll keep this on
linux-ext4 for sanity's sake.

On Sat, Oct 27, 2012 at 01:15:42AM +0200, Martin wrote:
>
> sorry for the repetition, but Theodore Ts'o asked me to re-post this
> as a new mail thread in order to keep track of this more easily.

1) Is it repeatable? This is what happened once. Can you reliably reproduce it?

2) Can you send the full dmesg of the previous boot and the boot where
you found the problem?

3) Can you send me the fsck logs?

4) Can you send me the output dumpe2fs on the file system in question?

Thanks....

- Ted