LinuxLists.cc - Mild filesystem corruption on ext4 (no journal)

2009-06-05 10:48:18

Subject: Mild filesystem corruption on ext4 (no journal)

Hi,

I run ext4 without a journal on my cheap netbook with a 4 gig SSD. I
suspect "without a journal" is significant, I don't think I'm doing
anything else strange.

When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable),
the locale breaks every reboot, and I have to repair it by running
locale-gen. This happened now when I only upgraded libc, in order to
play with signalfd(). It also happened before, when I upgraded the
entire machine to debian unstable (which I later reverted).

The problem is that /usr/lib/locale/locale-archive gets corrupted when I
reboot. The exact corruption differs with each reboot (i.e. the md5sum
differs). Last time, the first ~70K was overwritten with data from
xorg.log and my web browsing history. I have copies of the original and
corrupted state which I can send, the full file is 1.3 megs, but I can
limit it to the first 70K, since that's all that was corrupted.

To try and rule out a faulty userspace program, I marked the file as
read-only (chmod a-w) and immutable (chattr +i). After a reboot, the
file was still read-only and immutable, yet it still became corrupted.

Also, I ran md5sum in the shutdown scripts, after mounting the root
filesystem read-only (which is also preceeded by a sync in a different
script). This showed that the file did not appear corrupted at this
point. (Though maybe it was ok in page-cache, but corrupted on-disk).

The locale-archive file is read by the libc locale routines using
mmap(). The mapping is read only and is not modified. It seems likely
that some process has it mapped when the kernel shuts down.

I tried reproducing this by writting a minimal daemon which maps a copy
of the locale-archive file, and starting it just before the filesystem
is remounted read-only. It didn't work though; this copy of the
locale-archive file remained uncorrupted.

I forced a fsck on boot, and the filesystem was reported to be clean. I
am currently running with e2fsprogs v1.41.6 (from debian unstable), and
a custom-built kernel, 2.6.30-rc7.

Thanks in advance!
Alan

2009-06-05 14:40:43

by Aioanei Rares

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Alan Jenkins wrote:
> Hi,
>
> I run ext4 without a journal on my cheap netbook with a 4 gig SSD. I
> suspect "without a journal" is significant, I don't think I'm doing
> anything else strange.
>
> When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable),
> the locale breaks every reboot, and I have to repair it by running
> locale-gen. This happened now when I only upgraded libc, in order to
> play with signalfd(). It also happened before, when I upgraded the
> entire machine to debian unstable (which I later reverted).
>
> The problem is that /usr/lib/locale/locale-archive gets corrupted when
> I reboot. The exact corruption differs with each reboot (i.e. the
> md5sum differs). Last time, the first ~70K was overwritten with data
> from xorg.log and my web browsing history. I have copies of the
> original and corrupted state which I can send, the full file is 1.3
> megs, but I can limit it to the first 70K, since that's all that was
> corrupted.
>
> To try and rule out a faulty userspace program, I marked the file as
> read-only (chmod a-w) and immutable (chattr +i). After a reboot, the
> file was still read-only and immutable, yet it still became corrupted.
>
> Also, I ran md5sum in the shutdown scripts, after mounting the root
> filesystem read-only (which is also preceeded by a sync in a different
> script). This showed that the file did not appear corrupted at this
> point. (Though maybe it was ok in page-cache, but corrupted on-disk).
>
> The locale-archive file is read by the libc locale routines using
> mmap(). The mapping is read only and is not modified. It seems
> likely that some process has it mapped when the kernel shuts down.
>
> I tried reproducing this by writting a minimal daemon which maps a
> copy of the locale-archive file, and starting it just before the
> filesystem is remounted read-only. It didn't work though; this copy
> of the locale-archive file remained uncorrupted.
>
> I forced a fsck on boot, and the filesystem was reported to be clean.
> I am currently running with e2fsprogs v1.41.6 (from debian unstable),
> and a custom-built kernel, 2.6.30-rc7.
>
> Thanks in advance!
> Alan
> --
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
I suspect, although I might be wrong, that this is not a kernel-related
problem.

2009-06-05 14:49:43

by Alan Jenkins

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Aioanei Rares wrote:
> Alan Jenkins wrote:
>> Hi,
>>
>> I run ext4 without a journal on my cheap netbook with a 4 gig SSD. I
>> suspect "without a journal" is significant, I don't think I'm doing
>> anything else strange.
>>
>> When I upgrade libc from 2.7 (debian stable) to 2.9 (debian
>> unstable), the locale breaks every reboot, and I have to repair it by
>> running locale-gen. This happened now when I only upgraded libc, in
>> order to play with signalfd(). It also happened before, when I
>> upgraded the entire machine to debian unstable (which I later reverted).
>>
>> The problem is that /usr/lib/locale/locale-archive gets corrupted
>> when I reboot. The exact corruption differs with each reboot (i.e.
>> the md5sum differs). Last time, the first ~70K was overwritten with
>> data from xorg.log and my web browsing history. I have copies of the
>> original and corrupted state which I can send, the full file is 1.3
>> megs, but I can limit it to the first 70K, since that's all that was
>> corrupted.
>>
>> To try and rule out a faulty userspace program, I marked the file as
>> read-only (chmod a-w) and immutable (chattr +i). After a reboot, the
>> file was still read-only and immutable, yet it still became corrupted.
>>
>> Also, I ran md5sum in the shutdown scripts, after mounting the root
>> filesystem read-only (which is also preceeded by a sync in a
>> different script). This showed that the file did not appear
>> corrupted at this point. (Though maybe it was ok in page-cache, but
>> corrupted on-disk).
>>
>> The locale-archive file is read by the libc locale routines using
>> mmap(). The mapping is read only and is not modified. It seems
>> likely that some process has it mapped when the kernel shuts down.
>>
>> I tried reproducing this by writting a minimal daemon which maps a
>> copy of the locale-archive file, and starting it just before the
>> filesystem is remounted read-only. It didn't work though; this copy
>> of the locale-archive file remained uncorrupted.
>>
>> I forced a fsck on boot, and the filesystem was reported to be
>> clean. I am currently running with e2fsprogs v1.41.6 (from debian
>> unstable), and a custom-built kernel, 2.6.30-rc7.
>>
>> Thanks in advance!
>> Alan
>> --
>> To unsubscribe from this list: send the line "unsubscribe
>> linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>>
> I suspect, although I might be wrong, that this is not a kernel-related
> problem.

"To try and rule out a faulty userspace program, I marked the file as
read-only (chmod a-w) and immutable (chattr +i). After a reboot, the
file was still read-only and immutable, yet it still became corrupted."

Since the immutable bit is not respected, I tend to think it is a kernel
problem. Unless the filesystem isn't getting unmounted/flushed properly
for some reason... but I thought the modern kernel had that covered.

I agree it is very suspicious this happens only after upgrading libc.
I'll see if I can find an individual change in libc locale-handling that
might trigger this.

Thanks
Alan

2009-06-05 15:20:46

by Eric Sandeen

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Alan Jenkins wrote:
> Aioanei Rares wrote:

>> I suspect, although I might be wrong, that this is not a kernel-related
>> problem.
>
> "To try and rule out a faulty userspace program, I marked the file as
> read-only (chmod a-w) and immutable (chattr +i). After a reboot, the
> file was still read-only and immutable, yet it still became corrupted."
>
> Since the immutable bit is not respected, I tend to think it is a kernel
> problem. Unless the filesystem isn't getting unmounted/flushed properly
> for some reason... but I thought the modern kernel had that covered.
>
> I agree it is very suspicious this happens only after upgrading libc.
> I'll see if I can find an individual change in libc locale-handling that
> might trigger this.

Maybe you could try some things in your shutdown script, such as
explicitly fsyncing the file, or bmapping it with filefrag, or dropping
caches and rereading it... see what the state is just before the
shutdown compared to after the reboot.

-Eric

2009-06-05 16:41:45

by Alan Jenkins

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Eric Sandeen wrote:
> Alan Jenkins wrote:
>
>> Aioanei Rares wrote:
>>
>
>
>>> I suspect, although I might be wrong, that this is not a kernel-related
>>> problem.
>>>
>> "To try and rule out a faulty userspace program, I marked the file as
>> read-only (chmod a-w) and immutable (chattr +i). After a reboot, the
>> file was still read-only and immutable, yet it still became corrupted."
>>
>> Since the immutable bit is not respected, I tend to think it is a kernel
>> problem. Unless the filesystem isn't getting unmounted/flushed properly
>> for some reason... but I thought the modern kernel had that covered.
>>
>> I agree it is very suspicious this happens only after upgrading libc.
>> I'll see if I can find an individual change in libc locale-handling that
>> might trigger this.
>>
>
> Maybe you could try some things in your shutdown script, such as
> explicitly fsyncing the file, or bmapping it with filefrag, or dropping
> caches and rereading it... see what the state is just before the
> shutdown compared to after the reboot.
>
> -Eric
>

Dropping caches (and running sync first) had no effect on the result of
md5sum. Hopefully that narrows it down a bit.

Thanks to your prodding though, I have another interesting finding:

If I remove the corrupted file and copy a "known good" copy into it's
place, then the corruption doesn't happen. I've verified this a couple
of times. The corruption only occurs if the file was created by
"locale-gen".

I'll continue to try work out why :-).

Thanks
Alan

2009-06-05 16:51:50

by Kay Sievers

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

On Fri, Jun 5, 2009 at 18:43, Alan Jenkins<[email protected]> wrote:

> If I remove the corrupted file and copy a "known good" copy into it's place,
> then the corruption doesn't happen. I've verified this a couple of times.
> The corruption only occurs if the file was created by "locale-gen".

Does it use things like: fallocate()?

Kay

2009-06-05 18:01:31

by Theodore Ts'o

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

On Fri, Jun 05, 2009 at 05:40:33PM +0300, Aioanei Rares wrote:
>> When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable),
>> the locale breaks every reboot, and I have to repair it by running
>> locale-gen. This happened now when I only upgraded libc, in order to
>> play with signalfd(). It also happened before, when I upgraded the
>> entire machine to debian unstable (which I later reverted).
>>
>> The problem is that /usr/lib/locale/locale-archive gets corrupted when
>> I reboot. The exact corruption differs with each reboot (i.e. the
>> md5sum differs). Last time, the first ~70K was overwritten with data
>> from xorg.log and my web browsing history. I have copies of the
>> original and corrupted state which I can send, the full file is 1.3
>> megs, but I can limit it to the first 70K, since that's all that was
>> corrupted.

> I suspect, although I might be wrong, that this is not a kernel-related
> problem.

Actually, I suspect it is indeed a kernel-related problem. The
problem has been reported before, with a repeatable test case:

http://bugzilla.kernel.org/show_bug.cgi?id=13292

The problem shows up after you unmount and remount the filesystem.
Before you the filesystem is unmounted, the locale-archive file has
the correct md5sum. After you unmount and remount the filesystem, the
filesystem is corrupted. I'm guessing that some data blocks aren't
getting marked as needing writeback, so the previous contents on disk
aren't written back. I was able to show that even though the mounted
filesystem had the correct information, direct access to the disk
using debugfs showed the blocks on disk had the contents that would be
revealed after the filesystem was unmounted and remounted.

The problem only shows up when using ext4 without a journal, and I was
never able to create a simpler reproduction case. The last time I
tried to work on this bug was approximately a month ago. About two
weeks ago Frank from Google tried reproducing it, but he wasn't able
to do so using his 2.6.26-based kernel plus an updated ext4.
Unfortunately, I haven't had time to look at it since then, or to
check to see if some of the more recent patches scheduled for the
2.6.31 merge window might have changed the behaviour of this bug.

- Ted

2009-06-05 18:12:50

by Eric Sandeen

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Alan Jenkins wrote:
> Eric Sandeen wrote:

>> Maybe you could try some things in your shutdown script, such as
>> explicitly fsyncing the file, or bmapping it with filefrag, or dropping
>> caches and rereading it... see what the state is just before the
>> shutdown compared to after the reboot.
>>
>> -Eric
>>
>
> Dropping caches (and running sync first) had no effect on the result of
> md5sum. Hopefully that narrows it down a bit.

And did the reread after dropping caches have the right data?

Did the block numbers reported by filefrag -v change post-boot?

-Eric

2009-06-05 21:31:21

by Alan Jenkins

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Eric Sandeen wrote:
> Alan Jenkins wrote:
>
>> Eric Sandeen wrote:
>>
>
>
>>> Maybe you could try some things in your shutdown script, such as
>>> explicitly fsyncing the file, or bmapping it with filefrag, or dropping
>>> caches and rereading it... see what the state is just before the
>>> shutdown compared to after the reboot.
>>>
>>> -Eric
>>>
>>>
>> Dropping caches (and running sync first) had no effect on the result of
>> md5sum. Hopefully that narrows it down a bit.
>>
>
> And did the reread after dropping caches have the right data?
>

Yes.

> Did the block numbers reported by filefrag -v change post-boot?
>

Oh, I didn't understand that's what you were asking for.

The bug report Ted linked to says it's (most likely) a writeback issue.
In which case I think the block numbers won't change. I'll check
tomorrow, and follow-up if it turns up any unexpected result.

There's also speculation that it's a core kernel issue, something that
changed since 2.6.26. Perhaps that explains how remount-ro + sync +
drop_caches can leave the correct data sitting in the pagecache, without
either writing it to disk or dropping it.

Thanks
Alan

2009-06-05 21:33:18

by Alan Jenkins

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Theodore Tso wrote:
> On Fri, Jun 05, 2009 at 05:40:33PM +0300, Aioanei Rares wrote:
>
>>> When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable),
>>> the locale breaks every reboot, and I have to repair it by running
>>> locale-gen. This happened now when I only upgraded libc, in order to
>>> play with signalfd(). It also happened before, when I upgraded the
>>> entire machine to debian unstable (which I later reverted).
>>>
>>> The problem is that /usr/lib/locale/locale-archive gets corrupted when
>>> I reboot. The exact corruption differs with each reboot (i.e. the
>>> md5sum differs). Last time, the first ~70K was overwritten with data
>>> from xorg.log and my web browsing history. I have copies of the
>>> original and corrupted state which I can send, the full file is 1.3
>>> megs, but I can limit it to the first 70K, since that's all that was
>>> corrupted.
>>>
>
>
>> I suspect, although I might be wrong, that this is not a kernel-related
>> problem.
>>
>
> Actually, I suspect it is indeed a kernel-related problem. The
> problem has been reported before, with a repeatable test case:
>
> http://bugzilla.kernel.org/show_bug.cgi?id=13292
>
> The problem shows up after you unmount and remount the filesystem.
> Before you the filesystem is unmounted, the locale-archive file has
> the correct md5sum. After you unmount and remount the filesystem, the
> filesystem is corrupted. I'm guessing that some data blocks aren't
> getting marked as needing writeback, so the previous contents on disk
> aren't written back. I was able to show that even though the mounted
> filesystem had the correct information, direct access to the disk
> using debugfs showed the blocks on disk had the contents that would be
> revealed after the filesystem was unmounted and remounted.
>
> The problem only shows up when using ext4 without a journal, and I was
> never able to create a simpler reproduction case. The last time I
> tried to work on this bug was approximately a month ago. About two
> weeks ago Frank from Google tried reproducing it, but he wasn't able
> to do so using his 2.6.26-based kernel plus an updated ext4.
> Unfortunately, I haven't had time to look at it since then, or to
> check to see if some of the more recent patches scheduled for the
> 2.6.31 merge window might have changed the behaviour of this bug.
>
> - Ted
>

Well, thanks for the link + explanation! I look forward to the eventual
solution.
Alan

2009-06-05 21:41:24

by Alan Jenkins

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Kay Sievers wrote:
> On Fri, Jun 5, 2009 at 18:43, Alan Jenkins<[email protected]> wrote:
>
>
>> If I remove the corrupted file and copy a "known good" copy into it's place,
>> then the corruption doesn't happen. I've verified this a couple of times.
>> The corruption only occurs if the file was created by "locale-gen".
>>
>
> Does it use things like: fallocate()?
>
> Kay
>

Not quite. Ted says it's already reported as

http://bugzilla.kernel.org/show_bug.cgi?id=13292

It includes an strace log which implicates the use of both mmap() and
write() on the same file. Sounds scary :-).

Alan

2009-06-05 21:42:19

by Curt Wohlgemuth

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

On Fri, Jun 5, 2009 at 11:01 AM, Theodore Tso<[email protected]> wrote:
> On Fri, Jun 05, 2009 at 05:40:33PM +0300, Aioanei Rares wrote:
>>> When I upgrade libc from 2.7 (debian stable) to 2.9 (debian unstable),
>>> the locale breaks every reboot, and I have to repair it by running
>>> locale-gen. ?This happened now when I only upgraded libc, in order to
>>> play with signalfd(). ?It also happened before, when I upgraded the
>>> entire machine to debian unstable (which I later reverted).
>>>
>>> The problem is that /usr/lib/locale/locale-archive gets corrupted when
>>> I reboot. ?The exact corruption differs with each reboot (i.e. the
>>> md5sum differs). ?Last time, the first ~70K was overwritten with data
>>> from xorg.log and my web browsing history. ?I have copies of the
>>> original and corrupted state which I can send, the full file is 1.3
>>> megs, but I can limit it to the first 70K, since that's all that was
>>> corrupted.
>
>> I suspect, although I might be wrong, that this is not a kernel-related
>> problem.
>
> Actually, I suspect it is indeed a kernel-related problem. ?The
> problem has been reported before, with a repeatable test case:
>
> ? ? ? ?http://bugzilla.kernel.org/show_bug.cgi?id=13292
>
> The problem shows up after you unmount and remount the filesystem.
> Before you the filesystem is unmounted, the locale-archive file has
> the correct md5sum. ?After you unmount and remount the filesystem, the
> filesystem is corrupted. ?I'm guessing that some data blocks aren't
> getting marked as needing writeback, so the previous contents on disk
> aren't written back. ?I was able to show that even though the mounted
> filesystem had the correct information, direct access to the disk
> using debugfs showed the blocks on disk had the contents that would be
> revealed after the filesystem was unmounted and remounted.
>
> The problem only shows up when using ext4 without a journal, and I was
> never able to create a simpler reproduction case. ?The last time I
> tried to work on this bug was approximately a month ago. ?About two
> weeks ago Frank from Google tried reproducing it, but he wasn't able
> to do so using his 2.6.26-based kernel plus an updated ext4.
> Unfortunately, I haven't had time to look at it since then, or to
> check to see if some of the more recent patches scheduled for the
> 2.6.31 merge window might have changed the behaviour of this bug.

Just FYI: Frank Mayhar has recreated this issue in a recent kernel
(though we're not seeing it with our 2.6.26 kernel + ext4 patches),
and is actively working on it.

Curt

>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>

2009-06-05 22:02:31

by Eric Sandeen

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

Alan Jenkins wrote:
> Eric Sandeen wrote:
>> Alan Jenkins wrote:

...

>> And did the reread after dropping caches have the right data?
>>
>
> Yes.
>
>> Did the block numbers reported by filefrag -v change post-boot?
>>
>
> Oh, I didn't understand that's what you were asking for.

Yeah, after I saw that bug it does seem to be solely a data flushing
issue. I was trying the testcase and looking at what is on-disk in the
original image, what's on-disk after the unmount, and what is seen in
the chroot, for the file in question.... 3 different answers.

(and oddly enough dropping caches doesn't show it; you have to
unmount/remount)

-Eric

2009-06-06 04:17:29

by Henrique de Moraes Holschuh

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

On Fri, 05 Jun 2009, Alan Jenkins wrote:
> Not quite. Ted says it's already reported as
>
> http://bugzilla.kernel.org/show_bug.cgi?id=13292
>
> It includes an strace log which implicates the use of both mmap() and
> write() on the same file. Sounds scary :-).

Run Cyrus-IMAPd on that. It will cause the entire house to come down if
you have any sort of misbehaviour when userspace is doing extremely
annoying things. Cyrus-imapd is the mmap() test from hell.

--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh

2009-06-09 04:15:04

by Michael Rubin

[permalink] [raw]

Subject: Re: Mild filesystem corruption on ext4 (no journal)

On Fri, Jun 5, 2009 at 2:42 PM, Curt Wohlgemuth<[email protected]> wrote:
> Just FYI: Frank Mayhar has recreated this issue in a recent kernel
> (though we're not seeing it with our 2.6.26 kernel + ext4 patches),
> and is actively working on it.

Sorry about this. We had a bunch of sick engineers in the last week or
so (including Frank). Hopefully he will be able to devote time to this
bug as he is getting back. We need it fixed for ourselves so we are
going to root cause it if no one beats us to it.

mrubin