2009-05-13 09:22:42

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] New: ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292

Summary: ext4 without journal reproductible file corruption
Product: File System
Version: 2.5
Kernel Version: 2.6.29.3
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: blocking
Priority: P1
Component: ext4
AssignedTo: [email protected]
ReportedBy: [email protected]
Regression: No


Created an attachment (id=21323)
--> (http://bugzilla.kernel.org/attachment.cgi?id=21323)
The harddrive image to mount and chroot

Hi,

I found file(s) corruption using ext4 without journal, on two different
hardware (a dell vostro 1700, 64 bits archlinux + 64 bits jaunty, perso kernel
2.6.29.3 and an acer aspire one a110, ssd, 32 bits archlinux, same kernel
version).


The problem can be reproduced using the hdd image in attachment. This file is
an ext4 without journal image of a minimal archlinux 64 system.

Step to reproduce :

A-

1/ extract the archive on a 64 bits system running 2.6.29.3
2/ mount the .img as ext4, loopback
3/ chroot in the mounted directory
4/ execute "locale-gen"
5/ execute "locale". The result is ok
6/ have a look at /usr/lib/locale/locale-archive file (with vi or other editor)
7/ exit chroot
8/ umount hd image

B-

1/ mount again the hd image
2/ chroot again
3/ execute "locale". There are 3 errors at the beginning, the output is not the
same as previously
4/ look at /usr/lib/locale/locale-archive, the content is not the same !


Reproduce these steps with a journal (tune2fs -O has_journal, fsck -f) and
you'll see the file locale-archive and the output of locale command are
consistants between mount/umount.

The file locale-archive is not the only affected. On other system, that was
nvidia kernel module which was corrupted between installation and reboot.


Regards,

Thibault

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.


2009-05-13 23:10:14

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292


Frank Mayhar <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #1 from Frank Mayhar <[email protected]> 2009-05-13 23:10:14 ---
I would be interested to know what your actual output looks like. I've tried
to reproduce this and don't seem to be able to, at least not in my environment.

My attempts looked like:
[/root]# mount -t ext4 -o loop
/foo/archlinux_64_minimal_ext4_without_journal.img /mnt
[/root]# chroot /mnt
bash-3.2# locale-gen
Generating locales...
fr_FR.UTF-8... done
fr_FR.ISO-8859-1... done
[email protected] done
Generation complete.
bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=
bash-3.2# LANG=fr_FR
bash-3.2# locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=
bash-3.2# exit
[/root]# umount /mnt
[/root]# mount -t ext4 -o loop
/foo/archlinux_64_minimal_ext4_without_journal.img /mnt
[/root]# chroot /mnt
bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_US
LC_CTYPE="en_US"
LC_NUMERIC="en_US"
LC_TIME="en_US"
LC_COLLATE="en_US"
LC_MONETARY="en_US"
LC_MESSAGES="en_US"
LC_PAPER="en_US"
LC_NAME="en_US"
LC_ADDRESS="en_US"
LC_TELEPHONE="en_US"
LC_MEASUREMENT="en_US"
LC_IDENTIFICATION="en_US"
LC_ALL=
bash-3.2# LANG=fr_FR
bash-3.2# locale
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=
bash-3.2#

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-05-14 14:14:00

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292





--- Comment #2 from Thibault Mondary <[email protected]> 2009-05-14 14:14:01 ---
(In reply to comment #1)
I'am starting from a locale "[email protected]" on my system.


Output WITHOUT journal :

[email protected] ~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o
loop
[email protected] ~# chroot /mnt/loop

bash-3.2# locale-gen
Generating locales...
fr_FR.UTF-8... done
fr_FR.ISO-8859-1... done
[email protected] done
Generation complete.

bash-3.2# locale
[email protected]
LC_CTYPE="[email protected]"
LC_NUMERIC="[email protected]"
LC_TIME="[email protected]"
LC_COLLATE="[email protected]"
LC_MONETARY="[email protected]"
LC_MESSAGES="[email protected]"
LC_PAPER="[email protected]"
LC_NAME="[email protected]"
LC_ADDRESS="[email protected]"
LC_TELEPHONE="[email protected]"
LC_MEASUREMENT="[email protected]"
LC_IDENTIFICATION="[email protected]"
LC_ALL=

bash-3.2# exit
[email protected]:~# umount /mnt/loop

[email protected]:~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o
loop
[email protected] ~# chroot /mnt/loop

********BEGINNING OF THE PROBLEM : locales are normally generated from previous
mount**********

bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
[email protected]
LC_CTYPE="[email protected]"
LC_NUMERIC="[email protected]"
LC_TIME="[email protected]"
LC_COLLATE="[email protected]"
LC_MONETARY="[email protected]"
LC_MESSAGES="[email protected]"
LC_PAPER="[email protected]"
LC_NAME="[email protected]"
LC_ADDRESS="[email protected]"
LC_TELEPHONE="[email protected]"
LC_MEASUREMENT="[email protected]"
LC_IDENTIFICATION="[email protected]"
LC_ALL=

bash-3.2# LANG=fr_FR
bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=fr_FR
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=

**********END************


*************VERSION 2 : adding a journal to the image ************

[email protected]:~# tune2fs -O has_journal
archlinux_64_minimal_ext4_without_journal.img
tune2fs 1.41.4 (27-Jan-2009)
Création de l'i-noeud du journal : complété
Le système de fichiers sera automatiquement vérifié tous les 39 montages ou
après 180 jours, selon la première éventualité. Utiliser tune2fs -c ou -i
pour écraser la valeur.

[email protected]:~# e2fsck -f archlinux_64_minimal_ext4_without_journal.img
e2fsck 1.41.4 (27-Jan-2009)
Passe 1 : vérification des i-noeuds, des blocs et des tailles
Passe 2 : vérification de la structure des répertoires
Passe 3 : vérification de la connectivité des répertoires
Passe 4 : vérification des compteurs de référence
Passe 5 : vérification de l'information du sommaire de groupe
archlinux_64_minimal_ext4_without_journal.img : 21848/65536 fichiers (0.1%
nontigus), 86608/262144 blocs

[email protected]:~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o
loop
[email protected]:~# chroot /mnt/loop

bash-3.2# locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
[email protected]
LC_CTYPE="[email protected]"
LC_NUMERIC="[email protected]"
LC_TIME="[email protected]"
LC_COLLATE="[email protected]"
LC_MONETARY="[email protected]"
LC_MESSAGES="[email protected]"
LC_PAPER="[email protected]"
LC_NAME="[email protected]"
LC_ADDRESS="[email protected]"
LC_TELEPHONE="[email protected]"
LC_MEASUREMENT="[email protected]"
LC_IDENTIFICATION="[email protected]"
LC_ALL=

bash-3.2# locale-gen
Generating locales...
fr_FR.UTF-8... done
fr_FR.ISO-8859-1... done
[email protected] done
Generation complete.

bash-3.2# locale
[email protected]
LC_CTYPE="[email protected]"
LC_NUMERIC="[email protected]"
LC_TIME="fr_F[email protected]"
LC_COLLATE="[email protected]"
LC_MONETARY="[email protected]"
LC_MESSAGES="[email protected]"
LC_PAPER="[email protected]"
LC_NAME="[email protected]"
LC_ADDRESS="[email protected]"
LC_TELEPHONE="[email protected]"
LC_MEASUREMENT="[email protected]"
LC_IDENTIFICATION="[email protected]"
LC_ALL=

bash-3.2# exit
[email protected]:~# umount /mnt/loop

[email protected]:~# mount archlinux_64_minimal_ext4_without_journal.img /mnt/loop -o
loop
[email protected]:~# chroot /mnt/loop
*****************HERE, NO PROBLEM, locales are taken from previous
session*******

bash-3.2# locale
[email protected]
LC_CTYPE="[email protected]"
LC_NUMERIC="[email protected]"
LC_TIME="[email protected]"
LC_COLLATE="[email protected]"
LC_MONETARY="[email protected]"
LC_MESSAGES="[email protected]"
LC_PAPER="[email protected]"
LC_NAME="[email protected]"
LC_ADDRESS="[email protected]"
LC_TELEPHONE="[email protected]"
LC_MEASUREMENT="[email protected]"
LC_IDENTIFICATION="[email protected]"
LC_ALL=


exit, umount...






> I would be interested to know what your actual output looks like. I've tried
> to reproduce this and don't seem to be able to, at least not in my environment.
>
> My attempts looked like:
> [/root]# mount -t ext4 -o loop
> /foo/archlinux_64_minimal_ext4_without_journal.img /mnt
> [/root]# chroot /mnt
> bash-3.2# locale-gen
> Generating locales...
> fr_FR.UTF-8... done
> fr_FR.ISO-8859-1... done
> [email protected] done
> Generation complete.
> bash-3.2# locale
> locale: Cannot set LC_CTYPE to default locale: No such file or directory
> locale: Cannot set LC_MESSAGES to default locale: No such file or directory
> locale: Cannot set LC_ALL to default locale: No such file or directory
> LANG=en_US
> LC_CTYPE="en_US"
> LC_NUMERIC="en_US"
> LC_TIME="en_US"
> LC_COLLATE="en_US"
> LC_MONETARY="en_US"
> LC_MESSAGES="en_US"
> LC_PAPER="en_US"
> LC_NAME="en_US"
> LC_ADDRESS="en_US"
> LC_TELEPHONE="en_US"
> LC_MEASUREMENT="en_US"
> LC_IDENTIFICATION="en_US"
> LC_ALL=
> bash-3.2# LANG=fr_FR
> bash-3.2# locale
> LANG=fr_FR
> LC_CTYPE="fr_FR"
> LC_NUMERIC="fr_FR"
> LC_TIME="fr_FR"
> LC_COLLATE="fr_FR"
> LC_MONETARY="fr_FR"
> LC_MESSAGES="fr_FR"
> LC_PAPER="fr_FR"
> LC_NAME="fr_FR"
> LC_ADDRESS="fr_FR"
> LC_TELEPHONE="fr_FR"
> LC_MEASUREMENT="fr_FR"
> LC_IDENTIFICATION="fr_FR"
> LC_ALL=
> bash-3.2# exit
> [/root]# umount /mnt
> [/root]# mount -t ext4 -o loop
> /foo/archlinux_64_minimal_ext4_without_journal.img /mnt
> [/root]# chroot /mnt
> bash-3.2# locale
> locale: Cannot set LC_CTYPE to default locale: No such file or directory
> locale: Cannot set LC_MESSAGES to default locale: No such file or directory
> locale: Cannot set LC_ALL to default locale: No such file or directory
> LANG=en_US
> LC_CTYPE="en_US"
> LC_NUMERIC="en_US"
> LC_TIME="en_US"
> LC_COLLATE="en_US"
> LC_MONETARY="en_US"
> LC_MESSAGES="en_US"
> LC_PAPER="en_US"
> LC_NAME="en_US"
> LC_ADDRESS="en_US"
> LC_TELEPHONE="en_US"
> LC_MEASUREMENT="en_US"
> LC_IDENTIFICATION="en_US"
> LC_ALL=
> bash-3.2# LANG=fr_FR
> bash-3.2# locale
> LANG=fr_FR
> LC_CTYPE="fr_FR"
> LC_NUMERIC="fr_FR"
> LC_TIME="fr_FR"
> LC_COLLATE="fr_FR"
> LC_MONETARY="fr_FR"
> LC_MESSAGES="fr_FR"
> LC_PAPER="fr_FR"
> LC_NAME="fr_FR"
> LC_ADDRESS="fr_FR"
> LC_TELEPHONE="fr_FR"
> LC_MEASUREMENT="fr_FR"
> LC_IDENTIFICATION="fr_FR"
> LC_ALL=
> bash-3.2#

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.-

2009-05-14 20:07:25

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292





--- Comment #3 from Frank Mayhar <[email protected]> 2009-05-14 20:07:25 ---
Thank you for the very detailed information you provided. Unfortunately I
can't reproduce this problem in my environment. That environment is, however,
somewhat different from yours, in that it's a 2.6.26 kernel plus ext4 patches
up to March 16 (minus a few patches that depended on changes in other parts of
the kernel).

My suggestion is that you drop back to the March 16 (or so) kernel and see if
you can reproduce the problem. If you can't reproduce it then you can do a
binary search to see where the problem started, i.e. try an April 15 kernel,
etc., until you home in on the offending commit.

If you _can_ reproduce the problem with the March 16 kernel then something more
complex is going on and I'll have to try again to reproduce it here.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-05-20 01:02:32

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292


Theodore Tso <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #4 from Theodore Tso <[email protected]> 2009-05-20 01:02:32 ---
I've been able to replicate the problem using a 2.6.30-rc6 kernel with the ext4
patch queue applied.

It seems to be utterly repeatable, and it seems to have to do with how the
locale-gen program writes out /usr/lib/locale/locale-archive. After you run
local-gen, an md5sum of that file gives you:

e98e9a55061c63f7ae089f7ac016eac6 /mnt/usr/lib/locale/locale-archive

but after you unmount and remount the filesystem, an md5 of that file gives
you:

5ab6d62d18431d057a514eb7dbd78428 /mnt/usr/lib/locale/locale-archive

If I manually copy the file into place, it seems to be OK. So it must be in
how the file gets copied into place.

Unfortunately the image doesn't have strace, but I've tried stracing locale-gen
on an (32-bit x86) Ubuntu system, and it appears that locale-gen seems to
modify the file by using a combination of mmap as well as direct writes (?!?):

28124 open("/usr/lib/locale/locale-archive", O_RDWR|O_LARGEFILE) = 3
28124 fstat64(3, {st_mode=S_IFREG|0644, st_size=1330544, ...}) = 0
28124 fcntl64(3, F_SETLKW64, {type=F_WRLCK, whence=SEEK_CUR, start=0, len=56},
0xfffb3f20) = 0
28124 stat64("/usr/lib/locale/locale-archive", {st_mode=S_IFREG|0644,
st_size=1330544, ...}) = 0
28124 read(3,
"\t\1\2\336\0\0\0\0008\0\0\0\2\0\0\0\213\3\0\0\274*\0\0\26\0\0\0L\35\0\0\10"...,
56) = 56
28124 mmap2(NULL, 103860, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0xf6d58000
28124 _llseek(3, 0, [1330544], SEEK_END) = 0
28124 write(3, "\27\20\5
\23\0\0\0T\0\0\0X\0\0\0d\0\0\0d\4\0\0\0\202\2\0p\235\2\0|"..., 962094) = 962094
28124 _llseek(3, 0, [2292638], SEEK_END) = 0
28124 write(3, "\0\0"..., 2) = 2
28124 write(3, "\24\21\3 \6\0\0\0
\0\0\0\"\0\0\0$\0\0\0(\0\0\0,\0\0\0000\0\0\0."..., 3584) = 3584
28124 munmap(0xf6d58000, 103860) = 0
28124 close(3) = 0

All I can posit is that somehow some dirty bits aren't getting set so that some
data blocks aren't getting written back to disk, so that when the filesystem is
umounted and remounted. Using debugfs to look at the file, it looks indeed
like the blocks on disk are never getting written out. Using debugfs "dump
/usr/lib/locale/locale-archive /tmp/foo", I'm seeing the contents of what we
see after the filesystem is unmounted and remounted. Not at all clear why not
using a journal makes a difference, though.

I've tried running fsx on a filesystem without a journal, and it's not showing
the problem.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-05-20 17:36:25

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292





--- Comment #5 from Frank Mayhar <[email protected]> 2009-05-20 17:36:26 ---
Just FYI, as I said I was unable to reproduce this in our current environment.
We'll shortly be pulling in another set of patches from the ext4 stable tree
(bringing us from March to whatever is current at the time), at which point
I'll try again. If I'm successful I'll do a bisection to figure out which
patch or patches introduced the problem.

Assuming Ted doesn't beat me to it, anyway. :-)

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-05-27 19:23:41

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292





--- Comment #6 from Frank Mayhar <[email protected]> 2009-05-27 19:23:42 ---
Well, I pulled in a bunch of patches from the ext4 stable tree (although not
all of them, unfortunately, since our base is well behind top-of-tree) and was
still unable to reproduce this. I'll continue to keep an eye on it, however.

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-05-28 21:16:08

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292





--- Comment #7 from Frank Mayhar <[email protected]> 2009-05-28 21:16:09 ---
Yesterday I pulled down Ted's ext4-stable tree and built it. Today I used it
to actually reproduce this problem. Now to try to track it down...

BTW, my suspicion is that the problem is either somewhere in the rest of the
kernel or maybe some post-2.6.26 change elsewhere is tickling a problem in ext4
itself. I suspect this because we're very nearly up-to-date with ext4 itself
(modulo some patches that don't seem directly relevant to this issue) but the
rest of our kernel is still pretty much straight 2.6.26. If it were strictly
an ext4 issue I would think we would be able to reproduce it with our kernel,
but we can't.

I'm going to work on tracking it down; I'm posting here just to make sure I'm
not duplicating someone else's effort. Ted?

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2009-07-21 09:02:46

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 13292] ext4 without journal reproductible file corruption

http://bugzilla.kernel.org/show_bug.cgi?id=13292


Thibault Mondary <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |PATCH_ALREADY_AVAILABLE




--- Comment #8 from Thibault Mondary <[email protected]> 2009-07-21 09:02:44 ---
Hi,

I tested again using 2.6.31-rc3, and the bug seems to be resolved, no more
corruption using locale-gen.


Thibault

--
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.