Hi!
Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted
to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind
of linux-3.0 kernels.
So power disconnects are common, and even during regular reboot, I hear disk doing
emergency parking.
I don't know how barriers work over USB...
Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c.
OTOH, it is just a root filesystem... and nothing above should prevent correct operation
(right?)
On last mount, it remounted itself read-only, so there's time for fsck, I guess...
But I believe this means I am going to lose all the data on the filesystem, right?
Any idea what could have happened? It looks like garbage written over the filesystem, right?
I'm using devicemapper on another partition (for encrypted ext4). I feel I lost that filesystem,
too, but without root filesystem, I can't check it easily.
Any idea what to do so that it does not repeat?
Should I switch to plain ext2?
Pavel
-bash-4.1# fsck /dev/sdc4
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
fsck.ext2: Superblock invalid, trying backup blocks...
Superblock has an invalid journal (inode 8).
Clear<y>? yes
*** ext3 journal has been deleted - filesystem is now ext2 only ***
One or more block group descriptor checksums are invalid. Fix<y>? yes
Group descriptor 0 checksum is invalid. FIXED.
Group descriptor 1 checksum is invalid. FIXED.
Group descriptor 2 checksum is invalid. FIXED.
Group descriptor 3 checksum is invalid. FIXED.
Group descriptor 4 checksum is invalid. FIXED.
Group descriptor 5 checksum is invalid. FIXED.
Group descriptor 6 checksum is invalid. FIXED.
Group descriptor 7 checksum is invalid. FIXED.
Group descriptor 8 checksum is invalid. FIXED.
Group descriptor 9 checksum is invalid. FIXED.
Group descriptor 10 checksum is invalid. FIXED.
Group descriptor 11 checksum is invalid. FIXED.
Group descriptor 12 checksum is invalid. FIXED.
Group descriptor 13 checksum is invalid. FIXED.
Group descriptor 14 checksum is invalid. FIXED.
Group descriptor 15 checksum is invalid. FIXED.
Group descriptor 16 checksum is invalid. FIXED.
Group descriptor 17 checksum is invalid. FIXED.
Group descriptor 18 checksum is invalid. FIXED.
Group descriptor 19 checksum is invalid. FIXED.
Group descriptor 20 checksum is invalid. FIXED.
Group descriptor 21 checksum is invalid. FIXED.
Group descriptor 22 checksum is invalid. FIXED.
Group descriptor 23 checksum is invalid. FIXED.
Group descriptor 24 checksum is invalid. FIXED.
Group descriptor 25 checksum is invalid. FIXED.
Group descriptor 26 checksum is invalid. FIXED.
Group descriptor 27 checksum is invalid. FIXED.
Group descriptor 28 checksum is invalid. FIXED.
Group descriptor 29 checksum is invalid. FIXED.
Group descriptor 30 checksum is invalid. FIXED.
Group descriptor 31 checksum is invalid. FIXED.
Group descriptor 32 checksum is invalid. FIXED.
Group descriptor 33 checksum is invalid. FIXED.
Group descriptor 34 checksum is invalid. FIXED.
Group descriptor 35 checksum is invalid. FIXED.
Group descriptor 36 checksum is invalid. FIXED.
Group descriptor 37 checksum is invalid. FIXED.
Group descriptor 38 checksum is invalid. FIXED.
Group descriptor 39 checksum is invalid. FIXED.
Group descriptor 40 checksum is invalid. FIXED.
Group descriptor 41 checksum is invalid. FIXED.
Group descriptor 42 checksum is invalid. FIXED.
Group descriptor 43 checksum is invalid. FIXED.
Group descriptor 44 checksum is invalid. FIXED.
Group descriptor 45 checksum is invalid. FIXED.
Group descriptor 46 checksum is invalid. FIXED.
Group descriptor 47 checksum is invalid. FIXED.
Group descriptor 48 checksum is invalid. FIXED.
Group descriptor 49 checksum is invalid. FIXED.
Group descriptor 50 checksum is invalid. FIXED.
Group descriptor 51 checksum is invalid. FIXED.
Group descriptor 52 checksum is invalid. FIXED.
Group descriptor 53 checksum is invalid. FIXED.
Group descriptor 54 checksum is invalid. FIXED.
Group descriptor 55 checksum is invalid. FIXED.
Group descriptor 56 checksum is invalid. FIXED.
Group descriptor 57 checksum is invalid. FIXED.
Group descriptor 58 checksum is invalid. FIXED.
Group descriptor 59 checksum is invalid. FIXED.
Group descriptor 60 checksum is invalid. FIXED.
Group descriptor 61 checksum is invalid. FIXED.
Group descriptor 62 checksum is invalid. FIXED.
Group descriptor 63 checksum is invalid. FIXED.
Group descriptor 64 checksum is invalid. FIXED.
Group descriptor 65 checksum is invalid. FIXED.
Group descriptor 66 checksum is invalid. FIXED.
Group descriptor 67 checksum is invalid. FIXED.
Group descriptor 68 checksum is invalid. FIXED.
Group descriptor 69 checksum is invalid. FIXED.
Group descriptor 70 checksum is invalid. FIXED.
Group descriptor 71 checksum is invalid. FIXED.
Group descriptor 72 checksum is invalid. FIXED.
Group descriptor 73 checksum is invalid. FIXED.
Group descriptor 74 checksum is invalid. FIXED.
Group descriptor 75 checksum is invalid. FIXED.
Group descriptor 76 checksum is invalid. FIXED.
Group descriptor 77 checksum is invalid. FIXED.
Group descriptor 78 checksum is invalid. FIXED.
Group descriptor 79 checksum is invalid. FIXED.
Group descriptor 80 checksum is invalid. FIXED.
Group descriptor 81 checksum is invalid. FIXED.
Group descriptor 82 checksum is invalid. FIXED.
Group descriptor 83 checksum is invalid. FIXED.
Group descriptor 84 checksum is invalid. FIXED.
Group descriptor 85 checksum is invalid. FIXED.
Group descriptor 86 checksum is invalid. FIXED.
Group descriptor 87 checksum is invalid. FIXED.
Group descriptor 88 checksum is invalid. FIXED.
Group descriptor 89 checksum is invalid. FIXED.
Group descriptor 90 checksum is invalid. FIXED.
Group descriptor 91 checksum is invalid. FIXED.
Group descriptor 92 checksum is invalid. FIXED.
Group descriptor 93 checksum is invalid. FIXED.
Group descriptor 94 checksum is invalid. FIXED.
Group descriptor 95 checksum is invalid. FIXED.
Group descriptor 96 checksum is invalid. FIXED.
Group descriptor 97 checksum is invalid. FIXED.
Group descriptor 98 checksum is invalid. FIXED.
Group descriptor 99 checksum is invalid. FIXED.
Group descriptor 100 checksum is invalid. FIXED.
Group descriptor 101 checksum is invalid. FIXED.
Group descriptor 102 checksum is invalid. FIXED.
Group descriptor 103 checksum is invalid. FIXED.
Group descriptor 104 checksum is invalid. FIXED.
Group descriptor 105 checksum is invalid. FIXED.
Group descriptor 106 checksum is invalid. FIXED.
Group descriptor 107 checksum is invalid. FIXED.
Group descriptor 108 checksum is invalid. FIXED.
Group descriptor 109 checksum is invalid. FIXED.
Group descriptor 110 checksum is invalid. FIXED.
Group descriptor 111 checksum is invalid. FIXED.
Group descriptor 112 checksum is invalid. FIXED.
Group descriptor 113 checksum is invalid. FIXED.
Group descriptor 114 checksum is invalid. FIXED.
Group descriptor 115 checksum is invalid. FIXED.
Group descriptor 116 checksum is invalid. FIXED.
Group descriptor 117 checksum is invalid. FIXED.
Group descriptor 118 checksum is invalid. FIXED.
Group descriptor 119 checksum is invalid. FIXED.
Group descriptor 120 checksum is invalid. FIXED.
armroot contains a file system with errors, check forced.
Resize inode not valid. Recreate<y>? yes
Pass 1: Checking inodes, blocks, and sizes
Root inode is not a directory. Clear<y>? yes
Reserved inode 3 (<The ACL index inode>) has invalid mode. Clear<y>? yes
Inode 3 has a bad extended attribute block 46. Clear<y>? yes
Inode 3 should not have EOFBLOCKS_FL set (size 439101102805825840, lblk -1)
Clear<y>? yes
Inode 3, i_size is 439101102805825840, should be 0. Fix<y>? yes
Inode 3, i_blocks is 59567760357681, should be 0. Fix<y>? yes
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted
> to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind
> of linux-3.0 kernels.
>
> So power disconnects are common, and even during regular reboot, I hear disk doing
> emergency parking.
>
> I don't know how barriers work over USB...
>
> Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c.
>
> OTOH, it is just a root filesystem... and nothing above should prevent correct operation
> (right?)
>
> On last mount, it remounted itself read-only, so there's time for fsck, I guess...
>
> But I believe this means I am going to lose all the data on the filesystem, right?
It looks like the filesystem contains _way_ too many 0xffff's:
Inode 655221 has compression flag set on filesystem without compression support. Clear<y>? yes
Inode 655221 has INDEX_FL flag set but is not a directory.
Clear HTree index<y>? yes
Inode 655221 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk -1)
Clear<y>? yes
Inode 655221, i_size is 18446744073709551615, should be 0. Fix<y>? yes
Inode 655221, i_blocks is 281474976710655, should be 0. Fix<y>? yes
Inode 655222 is in use, but has dtime set. Fix<y>? yes
Inode 655222 has imagic flag set. Clear<y>? yes
Inode 655222 has a extra size (65535) which is invalid
Fix<y>? yes
Inode 655222 has compression flag set on filesystem without compression support. Clear<y>? yes
Inode 655222 has INDEX_FL flag set but is not a directory.
Clear HTree index<y>? yes
Inode 655222 should not have EOFBLOCKS_FL set (size 18446744073709551615, lblk -1)
Clear<y>?
I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - > /dev/sda3, but
then ran out of patience. So there may be something for analysis, but...
Any ideas?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Thu 2014-06-26 22:30:52, Pavel Machek wrote:
> Hi!
>
> > Ok, this ext4 filesystem does _not_ have easy life: it is in usb envelope, I wanted
> > to use it as a root filesystem, and it is connected to OLPC-1.75, running some kind
> > of linux-3.0 kernels.
> >
> > So power disconnects are common, and even during regular reboot, I hear disk doing
> > emergency parking.
> >
> > I don't know how barriers work over USB...
> >
> > Plus the drive has physical bad blocks, but I attempted to mark them with fsck -c.
> >
> > OTOH, it is just a root filesystem... and nothing above should prevent correct operation
> > (right?)
> >
> > On last mount, it remounted itself read-only, so there's time for fsck, I guess...
> >
> > But I believe this means I am going to lose all the data on the filesystem, right?
>
> It looks like the filesystem contains _way_ too many 0xffff's:
>
> Inode 655221 has compression flag set on filesystem without compression support. Clear<y>? yes
>
> Inode 655221 has INDEX_FL flag set but is not a directory.
> Clear HTree index<y>? yes
...
And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually
suceeded, and marked fs as clean. But second fsck had issues with /lost+found...
-bash-4.1# fsck /dev/sdc4
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
armroot: clean, 132690/985424 files, 1023715/3934116 blocks
-bash-4.1# fsck -f /dev/sdc4
fsck from util-linux-ng 2.18
e2fsck 1.41.12 (17-May-2010)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
'..' in /lost+found/#652090/auth-for-pavel-wzJd6X (17) is /lost+found (11), should be /lost+found/#652090 (652090).
Fix<y>? yes
Pass 4: Checking reference counts
Pass 5: Checking group summary information
armroot: ***** FILE SYSTEM WAS MODIFIED *****
armroot: 132690/985424 files (0.1% non-contiguous), 1023715/3934116 blocks
-bash-4.1#
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Thu, Jun 26, 2014 at 10:30:52PM +0200, Pavel Machek wrote:
>
> It looks like the filesystem contains _way_ too many 0xffff's:
That sounds like it's a hardware issue. It may be that the controller
did something insane while trying to do a write at the point when the
disk drive was disconnected (and so the drive suffered a power drop).
> I saved beggining of the filesystem using cat /dev/sdc4 | gzip -9 - > /dev/sda3, but
> then ran out of patience. So there may be something for analysis, but...
The way to snapshot just the metadata blocks for analysis is:
e2image -r /dev/hdc4 | bzip2 > ~/hdc4.e2i.bz2
But in this case, it's I doubt it will be very helpful, because
fundamentally, this appears to be a hardware issue.
- Ted
On Thu, Jun 26, 2014 at 10:50:49PM +0200, Pavel Machek wrote:
>
> And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually
> suceeded, and marked fs as clean. But second fsck had issues with /lost+found...
I'd need the previous fsck transcript to have any idea what might have
happened. I'll note though you are using an ancient version of e2fsck
(1.41.12, and there have been a huge number of bug fixes since
May 2010....)
- Ted
On Thu, 2014-06-26 at 22:20 +0200, Pavel Machek wrote:
> Hi!
>
> Ok, this ext4 filesystem does _not_ have easy life: it is in usb
> envelope, I wanted
> to use it as a root filesystem, and it is connected to OLPC-1.75,
> running some kind
> of linux-3.0 kernels.
>
> So power disconnects are common, and even during regular reboot, I
> hear disk doing
> emergency parking.
>
> I don't know how barriers work over USB...
Just like with other SCSI devices.
HTH
Oliver
Hi!
> > It looks like the filesystem contains _way_ too many 0xffff's:
>
> That sounds like it's a hardware issue. It may be that the controller
> did something insane while trying to do a write at the point when the
> disk drive was disconnected (and so the drive suffered a power
> drop).
Interesting. I tried to compare damaged image with the original, and
yes, way too many 0xffff. But they are not even block aligned? And
they start from byte 0... that area is not normally written, IIRC?
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000030 ffff 07ff 0000 0000 0000 0000 0000 0000
0000040 0000 0000 0000 0000 0000 0000 0000 0000
*
00003f0 0000 0000 0000 0000 0000 ffff ffff ffff
0000400 ffff ffff ffff ffff ffff ffff 3e28 002d
0000410 fd57 000c ffff ffff ffff ffff ffff ffff
0000420 ffff ffff ffff ffff ffff ffff ffff ffff
*
0000550 ffff ffff ffff ffff 0000 0000 ffff ffff
0000560 ffff ffff ffff ffff ffff ffff ffff ffff
0000570 ffff ffff ffff ffff 4ddb 0055 0000 0000
0000580 ffff ffff ffff ffff ffff ffff ffff ffff
0000590 ffff ffff 007e 0000 ffff ffff ffff ffff
00005a0 ffff ffff ffff ffff ffff ffff ffff ffff
*
00005c0 ffff ffff ffff ffff ffff ffff 682e 53ac
00005d0 3a29 000a 0515 0000 d144 002e 0000 0000
00005e0 7865 3474 6d5f 7061 625f 6f6c 6b63 0073
00005f0 0000 0000 0000 0000 0000 0000 0000 0000
0000600 ffff ffff ffff ffff ffff ffff ffff ffff
*
0001000 41c0 03e9 1000 0000 6133 53ac 6133 53ac
> > And for every bug in kernel, there's one in fsck: I did not expect it, but fsck actually
> > suceeded, and marked fs as clean. But second fsck had issues with /lost+found...
>
> I'd need the previous fsck transcript to have any idea what might have
> happened. I'll note though you are using an ancient version of e2fsck
> (1.41.12, and there have been a huge number of bug fixes since
> May 2010....)
Sorry for picking at fsck. No, it did quite a good job given
circumstances... and it probably does not make sense to debug old
version.
One more thing that I noticed: fsck notices bad checksum on inode, and
then offers to fix the checksum with 'y' being the default. If there's
trash in the inode, that will just induce more errors. (Including
potentially doubly-linked blocks?) Would it make more sense to clear
the inodes with bad checksums?
Thanks and best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote:
>
> One more thing that I noticed: fsck notices bad checksum on inode, and
> then offers to fix the checksum with 'y' being the default. If there's
> trash in the inode, that will just induce more errors. (Including
> potentially doubly-linked blocks?) Would it make more sense to clear
> the inodes with bad checksums?
Metadata checksums aren't in e2fsprogs 1.41 or 1.42. It will be in
the to-be-released e2fsprogs 1.43, and yes, we need to change things
so that the default answer is to zero the inode. We didn't do that
initially because we were more suspicious of the new metadata checksum
code in the kernel and e2fsprogs than we were of hardware faults. :-)
Cheers,
- Ted
On Sun 2014-06-29 17:04:28, Theodore Ts'o wrote:
> On Sun, Jun 29, 2014 at 10:25:16PM +0200, Pavel Machek wrote:
> >
> > One more thing that I noticed: fsck notices bad checksum on inode, and
> > then offers to fix the checksum with 'y' being the default. If there's
> > trash in the inode, that will just induce more errors. (Including
> > potentially doubly-linked blocks?) Would it make more sense to clear
> > the inodes with bad checksums?
>
> Metadata checksums aren't in e2fsprogs 1.41 or 1.42. It will be in
> the to-be-released e2fsprogs 1.43, and yes, we need to change things
> so that the default answer is to zero the inode. We didn't do that
> initially because we were more suspicious of the new metadata checksum
> code in the kernel and e2fsprogs than we were of hardware faults.
> :-)
:-). Aha, and I misremembered, it was block descriptor checksums, not
inode checksums:
One or more block group descriptor checksums are invalid. Fix? yes
Group descriptor 0 checksum is invalid. FIXED.
Group descriptor 1 checksum is invalid. FIXED.
Group descriptor 2 checksum is invalid. FIXED.
Group descriptor 3 checksum is invalid. FIXED.
I'm still trying to figure out what went wrong in the OLPC-1.75 + USB
disk case.
One possibility is that OLPC is unable to provide enough power from
the two USB ports to power Seagate Momentus 5400.6, and that the hard
drive fails to detect the brown-out and does something wrong. (Are
SATA drives expected to work at 4.5V? Because that's what is
guaranteed on USB, IIRC).
Heavy corruption happened when I was charging the phone _and_ running
the hard drive, from the OLPC. Now I have seen cases when OLPC crashed
on device plug-in, in what looked like a brown-out...
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, Jun 30, 2014 at 08:46:44AM +0200, Pavel Machek wrote:
> :-). Aha, and I misremembered, it was block descriptor checksums, not
> inode checksums:
>
> One or more block group descriptor checksums are invalid. Fix? yes
>
> Group descriptor 0 checksum is invalid. FIXED.
> Group descriptor 1 checksum is invalid. FIXED.
> Group descriptor 2 checksum is invalid. FIXED.
> Group descriptor 3 checksum is invalid. FIXED.
Yeah, what we should be doing here is to try to backup block
descriptors and check to see if they are valid, and if so, use them
instead.
> I'm still trying to figure out what went wrong in the OLPC-1.75 + USB
> disk case.
>
> One possibility is that OLPC is unable to provide enough power from
> the two USB ports to power Seagate Momentus 5400.6, and that the hard
> drive fails to detect the brown-out and does something wrong. (Are
> SATA drives expected to work at 4.5V? Because that's what is
> guaranteed on USB, IIRC).
The USB spec seems to require 5V +/i 0.25V, which also seems to be the
spec on laptop drives. It wouldn't surprise me if the OLPC (or its
power adapter) is a bit dodgy under heavy load, though. It might be
useful for you to measure the voltage and amps delivered at the USB
ports
> Heavy corruption happened when I was charging the phone _and_ running
> the hard drive, from the OLPC. Now I have seen cases when OLPC crashed
> on device plug-in, in what looked like a brown-out...
.... and from the power brick to see if either is out of spec.
- Ted
Hi!
(Note that this drive is in thinkpad x60, and never met olpc or nor
had any problems).
pavel@duo:~$ uname -a
Linux duo 3.15.0-rc8+ #365 SMP Mon Jun 9 09:18:29 CEST 2014 i686
GNU/Linux
EXT4-fs (sda3): error count: 11
EXT4-fs (sda3): initial error at 1401714179: ext4_mb_generate_buddy:756
EXT4-fs (sda3): last error at 1401714179: ext4_reserve_inode_write:4877
That sounds like media error to me?
But there's nothing in smart:
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
I rebooted (into 3.14), and fsck claims filesystem is marked as
clean...? I did fsck -f, no problems.
Heh, now fsck -cf runs, and I got the same kernel messages. fsck says:
"updating bad block inode", but it does not say how many badblocks it
found (if any). At the end it says "filesystem was modified" and
"reboot linux", so I assume it found something? OTOH dumpe2fs -b
/dev/sda3 does not report anything.
What is going on there?
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Fri, Jul 04, 2014 at 12:23:07PM +0200, Pavel Machek wrote:
>
> pavel@duo:~$ uname -a
> Linux duo 3.15.0-rc8+ #365 SMP Mon Jun 9 09:18:29 CEST 2014 i686
> GNU/Linux
>
> EXT4-fs (sda3): error count: 11
> EXT4-fs (sda3): initial error at 1401714179: ext4_mb_generate_buddy:756
> EXT4-fs (sda3): last error at 1401714179: ext4_reserve_inode_write:4877
>
> That sounds like media error to me?
If you search your system logs since the last fsck, you should find 11
instances of "EXT4-fs error" message, which means that there was some
file system inconsisntencies detected. The first error was detected at:
% date -d @1401714179
Mon Jun 2 09:02:59 EDT 2014
... which means that you haven't rebooted in a month, or your boot
scripts aren't automatically running fsck, or your clock is
incorrect.
The first inconsistency was detected in the function
ext4_mb_generate_buddy(), in line 756. This means there's an
inconsistency between the number of blocks marked as in use in a block
allocation bitmap, and summary statistics in the block group
descriptor. This can be caused by a hardware hiccup, or some kind of
kernel bug.
People have been reporting an increased incidence rate of this bug
since 3.15, so it's something we're trying to track down. There have
been some reports of eMMC bugs in 3.15 (see one such report at:
https://lkml.org/lkml/2014/6/12/19). But other people are reporting
this on SSD's such as the Samsung 840 PRO, which is a SATA attached
device. See some of the messages on ext4 with the subject line:
"ext4: journal has aborted").
At this point I suspect we have multiple causes that result in the
same symptom that have all appeared at about the same time, which has
made tracking down the root cause(s) very difficult.
It does seem to happen more often after an unclean shutdown, and there
does seem to be a very high correlation with eMMC devices. It's
possible there is a jbd2 bug that got introduced recently, where ext4
is modifying some field outside of a journal transaction. But I
haven't been able to reproduce this yet in controlled circumstances.
What I need from people reporting problems:
* What is the HDD/SSD/eMMC device involved
* What kernel version were you running
* What distribution are you running (more so I know what the init
scripts might or might not have been doing vis-a-vis running fsck
after a crash)
* Was there an unclean shutdown / power drop / hard reset involved?
If so, did the HDD/SSD/eMMC lose power, or was the reset button hit
on the machine?
* What sort of workload / application / test program running before
the crash, if any?
I really need all of this information, especially since at this point
I suspect there may be more than one cause with similar symptoms. So
it's important that just because someone else reports a similar
symptom, that folks not assume because one person has reported one set
of hardware / software details, that it's the same problem as theirs,
and so they don't need to report anymore info. I need as many data
points as possible at this point.
- Ted
Hi!
> > pavel@duo:~$ uname -a
> > Linux duo 3.15.0-rc8+ #365 SMP Mon Jun 9 09:18:29 CEST 2014 i686
> > GNU/Linux
> >
> > EXT4-fs (sda3): error count: 11
> > EXT4-fs (sda3): initial error at 1401714179: ext4_mb_generate_buddy:756
> > EXT4-fs (sda3): last error at 1401714179: ext4_reserve_inode_write:4877
> >
> > That sounds like media error to me?
>
> If you search your system logs since the last fsck, you should find 11
> instances of "EXT4-fs error" message, which means that there was some
> file system inconsisntencies detected. The first error was detected at:
>
> % date -d @1401714179
> Mon Jun 2 09:02:59 EDT 2014
Interesting. I always assumed 140... was block number.
> ... which means that you haven't rebooted in a month, or your boot
> scripts aren't automatically running fsck, or your clock is
> incorrect.
I suspect something is wrong with the reporting. I got this in kernel log _while
running fsck_. fsck was clean (take a look in the original email). I got weird
report with fsck -c, it told me filesystem modified but I don't think I got bad
blocks there.
I believe my scripts are running fsck automatically, and yes, I rebooted a lot
in a last month. It _may_ be possible that last month this x60 had different hard drive,
and I copied it bit-by-bit.
> It does seem to happen more often after an unclean shutdown, and there
> does seem to be a very high correlation with eMMC devices. It's
> possible there is a jbd2 bug that got introduced recently, where ext4
> is modifying some field outside of a journal transaction. But I
> haven't been able to reproduce this yet in controlled circumstances.
>
> What I need from people reporting problems:
>
> * What is the HDD/SSD/eMMC device involved
SATA hdd, will get you exact data.
> * What kernel version were you running
For last month? Various, 3.10 to 3.16-rc, mostly 3.15+.
> * What distribution are you running (more so I know what the init
> scripts might or might not have been doing vis-a-vis running fsck
> after a crash)
Debian 6.
> * Was there an unclean shutdown / power drop / hard reset involved?
> If so, did the HDD/SSD/eMMC lose power, or was the reset button hit
> on the machine?
Crash in last month? Probably yes.
> * What sort of workload / application / test program running before
> the crash, if any?
Just usual desktop / kernel development.
> and so they don't need to report anymore info. I need as many data
> points as possible at this point.
You'll get them.
Is it possible that my fsck is so old it does not clear this "filesystem
had error in past" flag? Because I strongly suspect I'll boot into
init=/bin/bash, run fsck, it will tell me "all clean", and the messages
will repeat in the middle of fsck run.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> > What I need from people reporting problems:
> >
> > * What is the HDD/SSD/eMMC device involved
>
> SATA hdd, will get you exact data.
Hitachi HTS545050A7E380, got it from ps/3 at april 25, 2014, never had
problems according to smart.
> > * What kernel version were you running
>
> For last month? Various, 3.10 to 3.16-rc, mostly 3.15+.
>
> > * What distribution are you running (more so I know what the init
> > scripts might or might not have been doing vis-a-vis running fsck
> > after a crash)
>
> Debian 6.
6.0.9
> Is it possible that my fsck is so old it does not clear this "filesystem
> had error in past" flag? Because I strongly suspect I'll boot into
> init=/bin/bash, run fsck, it will tell me "all clean", and the messages
> will repeat in the middle of fsck run.
And indeed:
..init=/bin/bash
# fsck /dev/sda3
e2fsck 1.41.12
rootfs: clean
# date +%s
1404496...
#
EXT4-fs (sda3): error count: 11
EXT4-fs (sda3): initial error at 1401741...
...
(hand copied)
Thanks,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Fri, Jul 04, 2014 at 07:21:04PM +0200, Pavel Machek wrote:
>
> Is it possible that my fsck is so old it does not clear this "filesystem
> had error in past" flag? Because I strongly suspect I'll boot into
> init=/bin/bash, run fsck, it will tell me "all clean", and the messages
> will repeat in the middle of fsck run.
Yes, that's what's going on. E2fsprogs v1.41.12 does not have the
code to clear those fields in the superblock; that code was added in
v1.41.13.
(There have also been a ****huge**** number of bug fixes since May
2010, which is when 1.41.12 was released, so I'd strongly suggest that
you upgrade to a newer version of e2fsprogs. In particular DON'T try
resizing an an ext4 file system, either on-line or off-line with a
version of e2fsprogs that ancient; there is a very good chance you
will badly corrupt the file system.)
Cheers,
- Ted
On Fri 2014-07-04 14:56:26, Theodore Ts'o wrote:
> On Fri, Jul 04, 2014 at 07:21:04PM +0200, Pavel Machek wrote:
> >
> > Is it possible that my fsck is so old it does not clear this "filesystem
> > had error in past" flag? Because I strongly suspect I'll boot into
> > init=/bin/bash, run fsck, it will tell me "all clean", and the messages
> > will repeat in the middle of fsck run.
>
> Yes, that's what's going on. E2fsprogs v1.41.12 does not have the
> code to clear those fields in the superblock; that code was added in
> v1.41.13.
>
> (There have also been a ****huge**** number of bug fixes since May
> 2010, which is when 1.41.12 was released, so I'd strongly suggest that
> you upgrade to a newer version of e2fsprogs. In particular DON'T try
> resizing an an ext4 file system, either on-line or off-line with a
> version of e2fsprogs that ancient; there is a very good chance you
> will badly corrupt the file system.)
Ok, I have compiled fsck from git, it calls itself 1.43-WIP (18-May-2014).
If I run it on my /dev/sda3, it still calls it clean and quits (even
through it should still have the "filesystem had error in past" flag).
I ran it -f, and it said all clean. Did not mention modifying the
filesystem.
Now I'm running fsck.new -cf. I don't think this filesystem has any
bad blocks. Still, it says "rootfs: Updating bad block inode."
... "FILE SYSTEM WAS MODIFIED", "REBOOT LINUX".
While looking at e2fsck sources:
sprintf(buf, "badblocks -b %d -X %s%s%s %llu",
fs->blocksize,
(ctx->options & E2F_OPT_PREEN) ? "" : "-s ",
(ctx->options & E2F_OPT_WRITECHECK) ? "-n " :
"",
fs->device_name,
ext2fs_blocks_count(fs->super)-1);
f = popen(buf, "r");
...is it really good idea? I think it will do the bad thing in (crazy)
setup such as this, or in any setup with space in filename:
root@duo:/dev# ls -al | grep echo
brw-rw---- 1 root disk 8, 3 Jul 6 14:56 `echo ownered`
root@duo:/dev# /usr/local/bin/
e2fsck.new unrar2
root@duo:/dev# /usr/local/bin/e2fsck.new '`echo ownered`'
e2fsck 1.43-WIP (18-May-2014)
`echo ownered` is mounted.
e2fsck: Cannot continue, aborting.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
> Now I'm running fsck.new -cf. I don't think this filesystem has any
> bad blocks. Still, it says "rootfs: Updating bad block inode."
> ... "FILE SYSTEM WAS MODIFIED", "REBOOT LINUX".
And here's patch to fix this uglyness. Unfortunately, it makes it read
the inode... but perhaps it is good idea as we are able to print
before/after bad block counts...?
Signed-off-by: Pavel Machek <[email protected]>
Thanks,
Pavel
diff --git a/e2fsck/badblocks.c b/e2fsck/badblocks.c
index 7f3641b..32e08bf 100644
--- a/e2fsck/badblocks.c
+++ b/e2fsck/badblocks.c
@@ -30,6 +30,7 @@ void read_bad_blocks_file(e2fsck_t ctx, const char *bad_blocks_file,
ext2_filsys fs = ctx->fs;
errcode_t retval;
badblocks_list bb_list = 0;
+ int old_bb_count = -1;
FILE *f;
char buf[1024];
@@ -51,14 +52,16 @@ void read_bad_blocks_file(e2fsck_t ctx, const char *bad_blocks_file,
* If we're appending to the bad blocks inode, read in the
* current bad blocks.
*/
- if (!replace_bad_blocks) {
- retval = ext2fs_read_bb_inode(fs, &bb_list);
- if (retval) {
- com_err("ext2fs_read_bb_inode", retval, "%s",
- _("while reading the bad blocks inode"));
- goto fatal;
- }
+ retval = ext2fs_read_bb_inode(fs, &bb_list);
+ if (retval) {
+ com_err("ext2fs_read_bb_inode", retval, "%s",
+ _("while reading the bad blocks inode"));
+ goto fatal;
}
+ old_bb_count = ext2fs_u32_list_count(bb_list);
+ printf("%s: Currently %d bad blocks.\n", ctx->device_name, old_bb_count);
+ if (replace_bad_blocks)
+ bb_list = 0;
/*
* Now read in the bad blocks from the file; if
@@ -95,10 +98,16 @@ void read_bad_blocks_file(e2fsck_t ctx, const char *bad_blocks_file,
goto fatal;
}
+ if ((ext2fs_u32_list_count(bb_list) == 0) &&
+ ((!replace_bad_blocks) || (!old_bb_count))) {
+ printf("%s: No bad blocks found, no update neeeded.\n", ctx->device_name);
+ return;
+ }
+
/*
* Finally, update the bad blocks from the bad_block_map
*/
- printf("%s: Updating bad block inode.\n", ctx->device_name);
+ printf("%s: Updating bad block inode (%d bad blocks).\n", ctx->device_name, ext2fs_u32_list_count(bb_list));
retval = ext2fs_update_bb_inode(fs, bb_list);
if (retval) {
com_err("ext2fs_update_bb_inode", retval, "%s",
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Jul 06, 2014 at 03:43:25PM +0200, Pavel Machek wrote:
> Hi!
>
> > Now I'm running fsck.new -cf. I don't think this filesystem has any
> > bad blocks. Still, it says "rootfs: Updating bad block inode."
> > ... "FILE SYSTEM WAS MODIFIED", "REBOOT LINUX".
>
> And here's patch to fix this uglyness. Unfortunately, it makes it read
> the inode... but perhaps it is good idea as we are able to print
> before/after bad block counts...?
>
> Signed-off-by: Pavel Machek <[email protected]>
Thanks, I'll take a look at these patches. Honestly, I've been half
tempted to remove the e2fsck -c option entirely. 99.9% of the time,
with modern disks, which has bad block remapping, it doesn't do any
good, and often, it's harmful.
In general, e2fsck -c is not something I recommend people use. If you
want to use badblocks by itself to see if there are any blocks that
are suffering read problems, that's fine, but if there is, in general
the safest thing to do is to mount the disk read-only, back it up, and
then either (a) reformat and see if you can restore onto it with
backups w/o any further errors, or (b) just trash the disk, and get a
new one, since in general the contents are way more valuable than the
disk itself. Certainly after trying (a), you get any further errors,
(b) is defintely the way to go.
- Ted
Hi!
> > > Now I'm running fsck.new -cf. I don't think this filesystem has any
> > > bad blocks. Still, it says "rootfs: Updating bad block inode."
> > > ... "FILE SYSTEM WAS MODIFIED", "REBOOT LINUX".
> >
> > And here's patch to fix this uglyness. Unfortunately, it makes it read
> > the inode... but perhaps it is good idea as we are able to print
> > before/after bad block counts...?
> >
> > Signed-off-by: Pavel Machek <[email protected]>
>
> Thanks, I'll take a look at these patches. Honestly, I've been half
> tempted to remove the e2fsck -c option entirely. 99.9% of the time,
> with modern disks, which has bad block remapping, it doesn't do any
> good, and often, it's harmful.
Well, when I got report about hw problems, badblocks -c was my first
instinct. On the usb hdd, the most errors were due to 3.16-rc1 kernel
bug, not real problems.
> In general, e2fsck -c is not something I recommend people use. If you
> want to use badblocks by itself to see if there are any blocks that
> are suffering read problems, that's fine, but if there is, in
> general
Actually, badblocks is really tricky to use, I'd not trust myself to
get parameters right.
> the safest thing to do is to mount the disk read-only, back it up, and
> then either (a) reformat and see if you can restore onto it with
> backups w/o any further errors, or (b) just trash the disk, and get a
> new one, since in general the contents are way more valuable than the
> disk itself. Certainly after trying (a), you get any further errors,
> (b) is defintely the way to go.
Well, 500GB disk takes a while to back up, plus you need the space. a)
will take few hours... And sometimes, data are much less valuable then
the HDD. I do have 2 copies of data I care about, using unison to keep
it in sync, and I plan to add 3rd, encrypted copy to Seagate Momentus
5400.6 series that failed (a). It seems that Seagate just got their
firmware wrong, while in thinkpad, the drive worked very much ok, with
exception with few sectors that could not be remapped. Now, USB
envelope seems to be much harsher evnironment for a HDD, and it has
few more bad sectors now, but that's somehow expected. I was not
treating the hdd as if it had valuable data.
So... please keep fsck -c :-).
[Actually, badblocks documentation leaves something to be desired.
Is ^C safe w.r.t. badblocks -n? Is hard poweroff safe?]
Thanks,
Pavel
(Actually it looks I forgot to free the badlist. Incremental patch:)
diff --git a/e2fsck/badblocks.c b/e2fsck/badblocks.c
index 32e08bf..7ae7a61 100644
--- a/e2fsck/badblocks.c
+++ b/e2fsck/badblocks.c
@@ -60,8 +60,10 @@ void read_bad_blocks_file(e2fsck_t ctx, const char *bad_blocks_file,
}
old_bb_count = ext2fs_u32_list_count(bb_list);
printf("%s: Currently %d bad blocks.\n", ctx->device_name, old_bb_count);
- if (replace_bad_blocks)
+ if (replace_bad_blocks) {
+ ext2fs_badblocks_list_free(bb_list);
bb_list = 0;
+ }
/*
* Now read in the bad blocks from the file; if
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Sun, Jul 06, 2014 at 11:37:11PM +0200, Pavel Machek wrote:
>
> Well, when I got report about hw problems, badblocks -c was my first
> instinct. On the usb hdd, the most errors were due to 3.16-rc1 kernel
> bug, not real problems.
The problem is with modern disk drives, this is a *wrong* instinct.
That's my point. In general, trying to mess with the bad blocks list
in the ext2/3/4 file system is just not the right thing to do with
modern disk drives. That's because with modern disk drives, the hard
drives will do bad block remapping.
Basically, with modern disks, if the HDD has a hard ECC error, it will
return an error --- but if you write to the sector, it will either
rewrite onto that location on the platter, or if that part of the
platter is truly gone, it will remap to the bad block spare pool. So
telling the disk to never use that block again isn't going to be the
right answer.
The badblocks approach to dealing with hardware problems made sense
back when we had IDE disks. But that's been over a decade ago. These
days, it's horribly obsolete.
- Ted
On Sun 2014-07-06 21:00:02, Theodore Ts'o wrote:
> On Sun, Jul 06, 2014 at 11:37:11PM +0200, Pavel Machek wrote:
> >
> > Well, when I got report about hw problems, badblocks -c was my first
> > instinct. On the usb hdd, the most errors were due to 3.16-rc1 kernel
> > bug, not real problems.
>
> The problem is with modern disk drives, this is a *wrong* instinct.
> That's my point. In general, trying to mess with the bad blocks list
> in the ext2/3/4 file system is just not the right thing to do with
> modern disk drives. That's because with modern disk drives, the hard
> drives will do bad block remapping.
Actually... I believe it was the right instinct.
If I wanted to recover the data... remount-r would be the way to
go. Then back it up using dd_rescue. ... But that way I'd turn bad
sectors into silent data corruption.
If I wanted to recover data from that partition, fsck -c (or
badblocks, but that's trickier) and then dd_rescue would be the way to go.
> Basically, with modern disks, if the HDD has a hard ECC error, it will
> return an error --- but if you write to the sector, it will either
> rewrite onto that location on the platter, or if that part of the
> platter is truly gone, it will remap to the bad block spare pool. So
> telling the disk to never use that block again isn't going to be the
> right answer.
Actually -- tool to do relocations would be nice. It is not exactly
easy to do it right by hand.
I know the theory. I had 5 read-error incidents this year.
#1: Seagate refuses to reallocate sectors. Not sure why, I tried
pretty much everything.
#2: 3.16-rc1 produces incorrect errors every 4GB, leading to "bad
sectors" that disappear with other kernels
#3: Some more bad sectors appear on the Seagate
#4: Kernel on thinkpad reports errors in daily check. Which is strange
because there's nothing in SMART.
#5: Some old IDE hdd has bad sectors in unused or unimportant areas.
In #5 the theory might match the reality (I did not check, I trashed
the disks).
> The badblocks approach to dealing with hardware problems made sense
> back when we had IDE disks. But that's been over a decade ago. These
> days, it's horribly obsolete.
Forcing reallocation is hard & tricky. You may want to simply mark it
bad and lose a tiny bit of disk space... And even if you want to force
reallocation, you want to do fsck -c, first, and restore affected
files from backup.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Hi!
With 3.16-rc3, I did deliberate powerdown by holding down power key
(not a clean shutdown). On the next boot, I got some scary messages
about data corruption, "filesystem has errors, check forced", "reboot
linux". Unfortunately, that made the scary messages gone forever (I
tried ^S, was not fast enough), as system rebooted.
But it seems I have more of the bad stuff coming:
Mounting local filesystems threw an oops and then mount was killed due
to out-of-memory. I lost sda2 (or /data) filesystem. Then both sda3
(root) and sda2 gave me. But there's no disk error either in smart or
in syslog.
Jul 8 01:03:18 duo kernel: EXT4-fs (sda3): error count: 2
Jul 8 01:03:18 duo kernel: EXT4-fs (sda3): initial error at
1404773782: ext4_mb_generate_buddy:757
Jul 8 01:03:18 duo kernel: EXT4-fs (sda3): last error at 1404773782:
ext4_mb_generate_buddy:757
Jul 8 01:05:44 duo kernel: EXT4-fs (sda2): error count: 12
Jul 8 01:05:44 duo kernel: EXT4-fs (sda2): initial error at
1404773906: ext4_mb_generate_buddy:757
Jul 8 01:05:44 duo kernel: EXT4-fs (sda2): last error at 1404774058:
ext4_journal_check_start:56
(Thinkpad x60 with Hitachi HTS... SATA disk).
I attach complete syslog from the boot up... it should have everything
relevant.
I'm running fsck -f on sda3 now. I'd like to repair sda2 tommorow.
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
On Mon, Jul 07, 2014 at 08:55:43PM +0200, Pavel Machek wrote:
> If I wanted to recover the data... remount-r would be the way to
> go. Then back it up using dd_rescue. ... But that way I'd turn bad
> sectors into silent data corruption.
>
> If I wanted to recover data from that partition, fsck -c (or
> badblocks, but that's trickier) and then dd_rescue would be the way to go.
Ah, if that's what you're worried about, just do the following:
badblocks -b 4096 -o /tmp/badblocks.sdXX /dev/sdXX
debugfs -R "icheck $(cat /tmp/badblocks.sdXX)" /dev/sdXX > /tmp/bad-inodes
debugfs -R "ncheck $(sed -e 1d /tmp/bad-inodes | awk '{print $2}' | sort -nu)" > /tmp/bad-files
This will give you a list of the files that contain blocks that had
I/O errors. So now you know which files have contents which have
probably been corrupted. No more silent data corruption. :-)
> Actually -- tool to do relocations would be nice. It is not exactly
> easy to do it right by hand.
It's not *that* hard. All you really need to do is:
for i in $(cat /tmp/badblocks.sdXX) ; do
dd if=/dev/zero of=/dev/sdXX bs=4k seek=$i count=1
done
e2fsck -f /dev/sdXX
For bonus points, you could write a C program which tries to read the
block one final time before doing the forced write of all zeros.
It's a bit harder if you are trying to interpret the device-driver
dependent error messages, and translate the absolute sector number
into a partition-relative block number. (Except sometimes, depending
on the block device, the number which is given is either a relative
sector number, or a relative block number.)
For disks that do bad block remapping, an even simpler thing to do is
to just delete the corrupted files. When the blocks get reallocated
for some other purpose, the HDD should automatically remap the block
on write, and if the write fails, such that you are getting an I/O
error on the write, it's time to replace the disk.
> Forcing reallocation is hard & tricky. You may want to simply mark it
> bad and lose a tiny bit of disk space... And even if you want to force
> reallocation, you want to do fsck -c, first, and restore affected
> files from backup.
Trying to force reallocation isn't that hard, so long as you have
resigned yourself that you've lost the data in the blocks in question.
And if it doesn't work, for whatever reason, I would simply not trust
the disk any longer.
For me at least, it's all about the value of the disk versus the value
of my time and the data on the disk. When I take my hourly rate into
question ($annual comp divided by 2000) the value of trying to save a
particular hard drive almost never works out in my favor. So these
days, my bias is to do what I can to save the data, but to not fool
around with trying to play fancy games with e2fsck -c. I'll just want
to save what I can, and hopefully, with regular backups, that won't
require heroic measures, and then trash and replace the HDD.
Cheers,
- Ted
P.S. I'm not sure why you consider running badblocks to be tricky.
The only thing you need to be careful about is passing the file system
blocksize to badblocks. And since the block size is almost always 4k
for any non-trivial file system, all you really need to do is
"badblocks -b 4096". Or, if you really like:
badblocks -b $(dumpe2fs -h /dev/sdXX | awk -F: '/^Block size: / {print $2}') /dev/sdXX
See? Easy peasy! :-)