[732715.730069] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0: ix (10742) != EXT_FIRST_INDEX (0)
[ 496.347230] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0:ix (10742) != EXT_FIRST_INDEX (0)
Same search_left... argument? Same inode?
$ sudo find . -inum 21374007 -print
./home/darxus/.cache/chromium/Default/Cache/data_3
My post about it to the wayland mailing list: http://lists.freedesktop.org/archives/wayland-devel/2012-September/005201.html
This happened running wayland with xwayland (X.org running as a rootless
wayland client) using RAOF's modified Radeon DDX (X driver). I was
running chromium this time it happened, there is some possibility I had
not run chromium since the last time it happened and did run it then,
I'm not sure.
What is the significance of this file? What does the error mean?
It seems like this shouldn't be possible. What could be causing it?
Someone mentioned this could be the result of something "scribbling over
kernel memory". Do these errors, and their similarity, provide any
insight into how to track it down?
The errors were three months apart. They corresponded to filesystem
corruption. Because of that I didn't run xwayland for the three months
between.
--
"When we remember we are all mad, the mysteries of life disappear and
life stands explained." - Mark Twain
http://www.ChaosReigns.com
On Mon, Sep 03, 2012 at 06:02:13PM -0400, [email protected] wrote:
> [732715.730069] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0: ix (10742) != EXT_FIRST_INDEX (0)
> [ 496.347230] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0:ix (10742) != EXT_FIRST_INDEX (0)
>
> What is the significance of this file? What does the error mean?
The error means that the metadata associated with one of Chromium's
cache files had gotten corrected. You say you saw this exact same
error separated in time by three months. Between those three months,
was the file system corruption fixed by an e2fsck run?
If not, the fact that you are getting this message twice isn't
surprising. The e2fsck program *should* have been run at each reboot,
but this error may have required manual intervention to fix up.
If you did run e2fsck, and the file system corruptions was fixed
between the two times that you saw the EXT4-fs error message, then
that is very interesting. I would discount scribbling over kernel
memory, since it would be pretty unusual that the exact same inode and
the exact same block had gotten corrupted in exactly the same way. It
could be a bug in the graphics driver, perhaps triggered by the way
Wayland is using said graphics driver.
That also seems fairly hard to credit, so I'm going to hope you didn't
actually run e2fsck to fix the file system corruption....
- Ted
On 09/03, Theodore Ts'o wrote:
> The error means that the metadata associated with one of Chromium's
> cache files had gotten corrected. You say you saw this exact same
> error separated in time by three months.
Yes.
> Between those three months,
> was the file system corruption fixed by an e2fsck run?
I'll admit I have a pretty bad habit of skipping fscks when they come
up, but when I got that error, my system had just remounted its main
filesystem readonly, and said those scary things in dmesg, and the first
thing I did was reboot and let fsck do whatever it wanted, including a
number of other scary things requiring me to confirm that I wanted it
to go ahead a bunch of times.
So, hard to say for certain, but it seems unlikely that this correction
would not have been included in that.
> If not, the fact that you are getting this message twice isn't
> surprising. The e2fsck program *should* have been run at each reboot,
> but this error may have required manual intervention to fix up.
>
> If you did run e2fsck, and the file system corruptions was fixed
> between the two times that you saw the EXT4-fs error message, then
> that is very interesting. I would discount scribbling over kernel
> memory, since it would be pretty unusual that the exact same inode and
> the exact same block had gotten corrupted in exactly the same way. It
> could be a bug in the graphics driver, perhaps triggered by the way
> Wayland is using said graphics driver.
>
> That also seems fairly hard to credit, so I'm going to hope you didn't
> actually run e2fsck to fix the file system corruption....
Yeah, this all seems very unlikely, yet...
I have lots of photos of the fsck this time, and all the questions it asked
me. But I guess I should re-fsck (forced), make sure it's all happy, check
my backups, and try to repeat the problem? :/
--
"Eh, wisdom's overrated. I prefer beatings and snacks."
- Unity, Skin Horse
http://www.ChaosReigns.com
On 09/03, Theodore Ts'o wrote:
> On Mon, Sep 03, 2012 at 06:02:13PM -0400, [email protected] wrote:
> > [732715.730069] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0: ix (10742) != EXT_FIRST_INDEX (0)
> > [ 496.347230] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0:ix (10742) != EXT_FIRST_INDEX (0)
> That also seems fairly hard to credit, so I'm going to hope you didn't
> actually run e2fsck to fix the file system corruption....
Ran fsck, reproduced it again. With chromium on X, no wayland involved.
Just loaded up chromium and started playing a youtube video. Different one
than last time. No idea if the youtube video was necessary. Verified
filesystem was writeable before running chromium, then it wasn't.
[ 4043.097706] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0: ix (10742) != EXT_FIRST_INDEX (0) (depth 1)!
[ 4043.097711] Aborting journal on device sda1-8.
[ 4043.097873] EXT4-fs (sda1): Remounting filesystem read-only
[ 4043.097883] EXT4-fs error (device sda1) in ext4_da_writepages:3033: IO failure
[ 4043.098011] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 962 pages, ino 21374007; err -30
Same inode again:
$ sudo find . -inum 21374007 -print
./home/darxus/.cache/chromium/Default/Cache/data_3
This was after, from another partition, with this partition
unmounted, doing a smartctl long test, and forced fsck of the
partition, which came back clean (and I created a file named
smart-and-fsck-are-happy-with-this-partition.txt on that partition to
make sure I wasn't getting them confused - it's still where I expect it)
(and updating my backups, and unmounting my backup partitions).
I guess I'm glad it wasn't wayland. I'm still... very curious what it
could possibly be.
I tried catting the file (Cache/data_3) to /dev/null before running
chromium, and I'm pretty sure I remembered to verify the filesystem was
still writeable after that.
This is ubuntu chromium-browser package version
18.0.1025.168~r134367-0ubuntu0.11.10.1. Maybe a bad interaction between it
and my Radeon drivers? (Radeon, not fglrx/catalyst.)
What now?
--
"I would believe only in a God that knows how to Dance." - Nietzsche
http://www.ChaosReigns.com
FYI, after fsck'ing, I checked my filesystem against my backup, and didn't
find anything that changed that shouldn't have changed. Command I used
was:
rsync -imva --compare-dest=/ /media/4tb/bak/dancer-2012-09-04/ /media/4tb/bak/changes/
--
"Just because you're offended, doesn't mean you're right." - Ricky Gervais
http://www.ChaosReigns.com
(anonymous) wrote:
> FYI, after fsck'ing, I checked my filesystem against my backup, and didn't
> find anything that changed that shouldn't have changed. Command I used
> was:
> rsync -imva --compare-dest=/ /media/4tb/bak/dancer-2012-09-04/ /media/4tb/bak/changes/
I ran (or rather run) into this issue as well. Starting on
October 22nd, I saw on my Fedora 16 (3.6.2-1.fc16.i686) sys-
tem:
| Oct 22 13:56:44 passepartout kernel: [ 1395.772939] EXT4-fs error (device dm-4): ext4_ext_search_left:1238: inode #274258: comm flush-253:4: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
| Oct 22 13:56:44 passepartout kernel: [ 1395.772951] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3666 with max blocks 3 with error -5
| Oct 22 13:56:44 passepartout kernel: [ 1395.772955] EXT4-fs (dm-4): This should not happen!! Data will be lost
| Oct 22 13:56:44 passepartout kernel: [ 1395.772957]
and it continued intermittently. Last message before "yes-
terday"'s shutdown was:
| Nov 2 04:14:09 passepartout kernel: [51109.016422] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: comm flush-253:4: ix (3666) != EXT
| _FIRST_INDEX (0) (depth 0)!
| Nov 2 04:14:09 passepartout kernel: [51109.016428] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3792 with max blocks 2
| with error -5
| Nov 2 04:14:09 passepartout kernel: [51109.016431] EXT4-fs (dm-4): This should not happen!! Data will be lost
| Nov 2 04:14:09 passepartout kernel: [51109.016431]
Looking at today's boot.log, I see no error detected by
"File System Check", and the manual run of "e2fsck -f"
showed also no errors.
Shortly after starting Chrome, the messages reappeared
again:
| Nov 2 15:15:48 passepartout kernel: [ 1979.196296] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: comm flush-253:4: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
| Nov 2 15:15:48 passepartout kernel: [ 1979.196306] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3672 with max blocks 2 with error -5
| Nov 2 15:15:48 passepartout kernel: [ 1979.196308] EXT4-fs (dm-4): This should not happen!! Data will be lost
| Nov 2 15:15:48 passepartout kernel: [ 1979.196308]
And indeed:
| [root@passepartout ~]# find ~tim -inum 274258
| /home/tim/.cache/google-chrome/Default/Cache/data_3
| [root@passepartout ~]#
So somehow Chromium/Chrome seems to be able to trigger ker-
nel messages indicating a file system error while no actual
file system errors seem to occur (very big assumption here
because I have no idea how to detect if "data_3" is cor-
rupted).
In my yum.log, I don't see any obvious package update prior
to October 22nd. kernel was updated on October 23rd, Chrome
on October 12th (22.0.1229.94-161065.i386).
Any ideas?
TIA,
Tim
On 11/2/12 10:50 AM, Tim Landscheidt wrote:
> (anonymous) wrote:
>
>> FYI, after fsck'ing, I checked my filesystem against my backup, and didn't
>> find anything that changed that shouldn't have changed. Command I used
>> was:
>
>> rsync -imva --compare-dest=/ /media/4tb/bak/dancer-2012-09-04/ /media/4tb/bak/changes/
>
> I ran (or rather run) into this issue as well. Starting on
> October 22nd, I saw on my Fedora 16 (3.6.2-1.fc16.i686) sys-
> tem:
>
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772939] EXT4-fs error (device dm-4): ext4_ext_search_left:1238: inode #274258: comm flush-253:4: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772951] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3666 with max blocks 3 with error -5
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772955] EXT4-fs (dm-4): This should not happen!! Data will be lost
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772957]
>
> and it continued intermittently. Last message before "yes-
> terday"'s shutdown was:
>
> | Nov 2 04:14:09 passepartout kernel: [51109.016422] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: comm flush-253:4: ix (3666) != EXT
> | _FIRST_INDEX (0) (depth 0)!
> | Nov 2 04:14:09 passepartout kernel: [51109.016428] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3792 with max blocks 2
> | with error -5
> | Nov 2 04:14:09 passepartout kernel: [51109.016431] EXT4-fs (dm-4): This should not happen!! Data will be lost
> | Nov 2 04:14:09 passepartout kernel: [51109.016431]
>
> Looking at today's boot.log, I see no error detected by
> "File System Check", and the manual run of "e2fsck -f"
> showed also no errors.
>
> Shortly after starting Chrome, the messages reappeared
> again:
>
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196296] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: comm flush-253:4: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196306] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3672 with max blocks 2 with error -5
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196308] EXT4-fs (dm-4): This should not happen!! Data will be lost
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196308]
>
> And indeed:
>
> | [root@passepartout ~]# find ~tim -inum 274258
> | /home/tim/.cache/google-chrome/Default/Cache/data_3
> | [root@passepartout ~]#
>
> So somehow Chromium/Chrome seems to be able to trigger ker-
> nel messages indicating a file system error while no actual
> file system errors seem to occur (very big assumption here
> because I have no idea how to detect if "data_3" is cor-
> rupted).
So it's the same inode every time.
What does
# debugfs -R "dump_extents <274258>" /dev/dm-4
show? (or whatever the appropriate device node path is)
You might also create an e2image -r fs image so we could
take a closer look later if needed. You could provide it offline if
filenames are sensitive (no file data is contained in the image).
Or you could use the filename obfuscation option.
But creating the e2image now just to capture the state might
be good.
-Eric
> In my yum.log, I don't see any obvious package update prior
> to October 22nd. kernel was updated on October 23rd, Chrome
> on October 12th (22.0.1229.94-161065.i386).
>
> Any ideas?
>
> TIA,
> Tim
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 11/2/12 10:50 AM, Tim Landscheidt wrote:
> (anonymous) wrote:
>
>> FYI, after fsck'ing, I checked my filesystem against my backup, and didn't
>> find anything that changed that shouldn't have changed. Command I used
>> was:
>
>> rsync -imva --compare-dest=/ /media/4tb/bak/dancer-2012-09-04/ /media/4tb/bak/changes/
>
> I ran (or rather run) into this issue as well. Starting on
> October 22nd, I saw on my Fedora 16 (3.6.2-1.fc16.i686) sys-
> tem:
>
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772939] EXT4-fs error (device dm-4): ext4_ext_search_left:1238: inode #274258: comm flush-253:4: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772951] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3666 with max blocks 3 with error -5
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772955] EXT4-fs (dm-4): This should not happen!! Data will be lost
> | Oct 22 13:56:44 passepartout kernel: [ 1395.772957]
>
> and it continued intermittently. Last message before "yes-
> terday"'s shutdown was:
>
> | Nov 2 04:14:09 passepartout kernel: [51109.016422] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: comm flush-253:4: ix (3666) != EXT
> | _FIRST_INDEX (0) (depth 0)!
> | Nov 2 04:14:09 passepartout kernel: [51109.016428] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3792 with max blocks 2
> | with error -5
> | Nov 2 04:14:09 passepartout kernel: [51109.016431] EXT4-fs (dm-4): This should not happen!! Data will be lost
> | Nov 2 04:14:09 passepartout kernel: [51109.016431]
>
> Looking at today's boot.log, I see no error detected by
> "File System Check", and the manual run of "e2fsck -f"
> showed also no errors.
Hm, on the image you provided to me for analysis, it was
indeed somewhat corrupted; I got about 400 lines of output
from e2fsck. However - did you take the image while it was
mounted, or did you unmount it first?
Anyway, inode 274258 wasn't in the fsck output.
However, I can reproduce the error you got:
[429903.549764] EXT4-fs error (device loop0): ext4_ext_search_left:1252: inode #274258: comm flush-7:0: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
so I'll see what's going on here.
Thanks,
-Eric
> Shortly after starting Chrome, the messages reappeared
> again:
>
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196296] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196306] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3672 with max blocks 2 with error -5
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196308] EXT4-fs (dm-4): This should not happen!! Data will be lost
> | Nov 2 15:15:48 passepartout kernel: [ 1979.196308]
>
> And indeed:
>
> | [root@passepartout ~]# find ~tim -inum 274258
> | /home/tim/.cache/google-chrome/Default/Cache/data_3
> | [root@passepartout ~]#
>
> So somehow Chromium/Chrome seems to be able to trigger ker-
> nel messages indicating a file system error while no actual
> file system errors seem to occur (very big assumption here
> because I have no idea how to detect if "data_3" is cor-
> rupted).
>
> In my yum.log, I don't see any obvious package update prior
> to October 22nd. kernel was updated on October 23rd, Chrome
> on October 12th (22.0.1229.94-161065.i386).
>
> Any ideas?
>
> TIA,
> Tim
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
On 11/2/12 1:55 PM, Tim Landscheidt wrote:
> Eric Sandeen <[email protected]> wrote:
>
>> [...]
>>> Shortly after starting Chrome, the messages reappeared
>>> again:
>
>>> | Nov 2 15:15:48 passepartout kernel: [ 1979.196296] EXT4-fs error (device dm-4): ext4_ext_search_left:1304: inode #274258: comm flush-253:4: ix (3666) != EXT_FIRST_INDEX (0) (depth 0)!
>>> | Nov 2 15:15:48 passepartout kernel: [ 1979.196306] EXT4-fs (dm-4): delayed block allocation failed for inode 274258 at logical offset 3672 with max blocks 2 with error -5
>>> | Nov 2 15:15:48 passepartout kernel: [ 1979.196308] EXT4-fs (dm-4): This should not happen!! Data will be lost
>>> | Nov 2 15:15:48 passepartout kernel: [ 1979.196308]
>
>>> And indeed:
>
>>> | [root@passepartout ~]# find ~tim -inum 274258
>>> | /home/tim/.cache/google-chrome/Default/Cache/data_3
>>> | [root@passepartout ~]#
>
>>> So somehow Chromium/Chrome seems to be able to trigger ker-
>>> nel messages indicating a file system error while no actual
>>> file system errors seem to occur (very big assumption here
>>> because I have no idea how to detect if "data_3" is cor-
>>> rupted).
>
>> So it's the same inode every time.
>
>> What does
>
>> # debugfs -R "dump_extents <274258>" /dev/dm-4
>
>> show? (or whatever the appropriate device node path is)
>
> See attachment.
Level Entries Logical Physical Length Flags
0/ 1 1/ 2 0 - 3665 1114157 3666
1/ 1 1/ 59 0 - 132 510721 - 510853 133
1/ 1 2/ 59 133 - 139 511415 - 511421 7
...
1/ 1 58/ 59 3039 - 3664 573440 - 574065 626
1/ 1 59/ 59 3665 - 4092 574066 - 574493 428
0/ 1 2/ 2 3666 - 9217 395702 5552
1/ 1 1/307 4093 - 4093 574494 - 574494 1
1/ 1 2/307 4094 - 4095 395758 - 395759 2
...
Ok, so the first top-level record says it covers logical 0->3665,
but the last extent actually goes from 3665->4092.
Then the next top level extent says it covers 3666->9217,
but that overlaps w/ the last real extent just prior, and
the first allocated extent under it actually starts at 4093.
so,
a) how'd it get into this state, and
b) why doesn't fsck care ...
Looking into that . . .
-Eric
On 11/13/12 12:28 PM, Eric Sandeen wrote:
> On 11/2/12 1:55 PM, Tim Landscheidt wrote:
...
>>> What does
>>
>>> # debugfs -R "dump_extents <274258>" /dev/dm-4
>>
>>> show? (or whatever the appropriate device node path is)
>>
>> See attachment.
>
> Level Entries Logical Physical Length Flags
> 0/ 1 1/ 2 0 - 3665 1114157 3666
> 1/ 1 1/ 59 0 - 132 510721 - 510853 133
> 1/ 1 2/ 59 133 - 139 511415 - 511421 7
> ...
> 1/ 1 58/ 59 3039 - 3664 573440 - 574065 626
> 1/ 1 59/ 59 3665 - 4092 574066 - 574493 428
> 0/ 1 2/ 2 3666 - 9217 395702 5552
> 1/ 1 1/307 4093 - 4093 574494 - 574494 1
> 1/ 1 2/307 4094 - 4095 395758 - 395759 2
> ...
>
> Ok, so the first top-level record says it covers logical 0->3665,
> but the last extent actually goes from 3665->4092.
>
> Then the next top level extent says it covers 3666->9217,
> but that overlaps w/ the last real extent just prior, and
> the first allocated extent under it actually starts at 4093.
>
> so,
> a) how'd it get into this state, and
> b) why doesn't fsck care ...
>
> Looking into that . . .
So this is pre-existing corruption somehow' that 2nd 0-level
record's first logical block should match the first 1st-level
extent's logical block under it. I was hoping you had just
run into some sort of extent tree traversal bug when looking
up this block, but I think you have an actual corruption in the
extent tree already.
You could work around this by just copying the file then renaming
it back, to get a different (presumably correct) extent tree.
But it'll be hard to work out how it got into this state, I don't
yet see how this can happen. :(
Does your box wind up crashing or losing power, and replaying the
log once? I'm wondering if it's possible that an extent tree
metadata update got lost in a crash . . .
-Eric
-Eric
On 9/3/12 5:02 PM, [email protected] wrote:
> [732715.730069] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0: ix (10742) != EXT_FIRST_INDEX (0)
> [ 496.347230] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0:ix (10742) != EXT_FIRST_INDEX (0)
>
> Same search_left... argument? Same inode?
>
> $ sudo find . -inum 21374007 -print
> ./home/darxus/.cache/chromium/Default/Cache/data_3
If you are still running into this, could you also do:
debugfs -R "dump_extents <21374007>" /dev/sda1
and send me and/or the list the result of that?
Thanks,
-Eric
On 11/15/12 4:01 PM, Eric Sandeen wrote:
> On 9/3/12 5:02 PM, [email protected] wrote:
>> [732715.730069] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0: ix (10742) != EXT_FIRST_INDEX (0)
>> [ 496.347230] EXT4-fs error (device sda1): ext4_ext_search_left:1275: inode #21374007: comm flush-8:0:ix (10742) != EXT_FIRST_INDEX (0)
>>
>> Same search_left... argument? Same inode?
>>
>> $ sudo find . -inum 21374007 -print
>> ./home/darxus/.cache/chromium/Default/Cache/data_3
>
> If you are still running into this, could you also do:
>
> debugfs -R "dump_extents <21374007>" /dev/sda1
>
> and send me and/or the list the result of that?
Oh, and a question for both of you guys - have you used e4defrag on this filesystem (or this file?)
-Eric