2010-01-08 02:54:28

by Tetsuo Handa

[permalink] [raw]
Subject: [2.6.30 and later] file corruption on ext3 filesystem.

Hello.

I'm experiencing file corruption problem.
Can somebody reproduce below result?

My environment:
VMware Workstation 6.5.3 with 2CPUs / 512MB RAM.
ext3 filesystem ( /dev/sda1 ) mounted on / .

2.6.33-rc3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.33-rc3-ext3 )
2.6.32.3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.32.3-ext3 )
2.6.31.11 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.31.11-ext3 )
2.6.30.10

So far, I haven't succeeded to reproduce this problem for 2.6.29 and earlier.
Maybe this problem exists in only 2.6.30 and later.

Steps to reproduce:

Compile below program using "gcc -Wall -O3 -o a.out".

----------
#include <stdio.h>
#include <string.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
FILE *fp = fopen("/testfile", "a");
char buffer[4096];
memset(buffer, argc > 1 ? argv[1][0] : 0x20, sizeof(buffer));
buffer[sizeof(buffer) - 1] = '\n';
fwrite(buffer, 1, sizeof(buffer), fp);
fflush(fp);
sleep(5);
fprintf(stderr, "Let power fail after a few seconds.\n");
while (1) {
sleep(1);
fwrite(buffer, 1, sizeof(buffer), fp);
}
return 0;
}
----------

Reboot the system by executing /sbin/reboot .

Run ./a.out and let the power fail (i.e. unplug the electric cable
or do equivalent) after more than 5 seconds (i.e. longer than kjournald's
commit interval). Probably 2 or 3 seconds after
"Let power fail after a few seconds.\n" was printed is the best.

Restart the system (and fsck will be executed).

Run "cat /testfile". It should contain only lines of 4095 spaces + '\n'
(or the byte specified via argv[]). But it contains different data.

This problem does not show up if the data written by ./a.out and the data in
previously deleted files are identical. Therefore, you may want to try with
different patterns like "./a.out 1" "./a.out 2" "./a.out 3" ...

Regards.


2010-01-08 04:07:42

by Jamie Lokier

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

Tetsuo Handa wrote:
> VMware Workstation 6.5.3 with 2CPUs / 512MB RAM.
> ext3 filesystem ( /dev/sda1 ) mounted on / .

> Run ./a.out and let the power fail (i.e. unplug the electric cable
> or do equivalent) after more than 5 seconds (i.e. longer than kjournald's
> commit interval). Probably 2 or 3 seconds after
> "Let power fail after a few seconds.\n" was printed is the best.

It could be a kernel problem, but it could also be caused by VMware.

The combination of journalling, barriers, committing data safely to
disk and VMs is complicated and does not always provide filesystem
integrity on power failure, depending on many factors.

However, if you see corruption only with 2.6.30 and later, but not
with 2.6.29, that does suggest a kernel problem.

-- Jamie

2010-01-08 12:36:49

by Dave Chinner

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

On Fri, Jan 08, 2010 at 11:54:24AM +0900, Tetsuo Handa wrote:
> Hello.
>
> I'm experiencing file corruption problem.
> Can somebody reproduce below result?
>
> My environment:
> VMware Workstation 6.5.3 with 2CPUs / 512MB RAM.
> ext3 filesystem ( /dev/sda1 ) mounted on / .
>
> 2.6.33-rc3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.33-rc3-ext3 )
> 2.6.32.3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.32.3-ext3 )
> 2.6.31.11 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.31.11-ext3 )
> 2.6.30.10
>
> So far, I haven't succeeded to reproduce this problem for 2.6.29 and earlier.
> Maybe this problem exists in only 2.6.30 and later.

Isn't that when the default mount options changed from data=ordered to
data=writeback?

> Steps to reproduce:
>
> Compile below program using "gcc -Wall -O3 -o a.out".
>
> ----------
> #include <stdio.h>
> #include <string.h>
> #include <unistd.h>
>
> int main(int argc, char *argv[])
> {
> FILE *fp = fopen("/testfile", "a");
> char buffer[4096];
> memset(buffer, argc > 1 ? argv[1][0] : 0x20, sizeof(buffer));
> buffer[sizeof(buffer) - 1] = '\n';
> fwrite(buffer, 1, sizeof(buffer), fp);
> fflush(fp);
> sleep(5);
> fprintf(stderr, "Let power fail after a few seconds.\n");
> while (1) {
> sleep(1);
> fwrite(buffer, 1, sizeof(buffer), fp);
> }
> return 0;
> }
> ----------
>
> Reboot the system by executing /sbin/reboot .
>
> Run ./a.out and let the power fail (i.e. unplug the electric cable
> or do equivalent) after more than 5 seconds (i.e. longer than kjournald's
> commit interval). Probably 2 or 3 seconds after
> "Let power fail after a few seconds.\n" was printed is the best.
>
> Restart the system (and fsck will be executed).
>
> Run "cat /testfile". It should contain only lines of 4095 spaces + '\n'
> (or the byte specified via argv[]). But it contains different data.

You didn't fsync() it, so there is no reason for the kernel
to have ever written it to disk. Therefore the result after powerfail
is completely undefined - you data may be there, it may not...

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-01-08 13:15:16

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

Hello.

Dave Chinner wrote:
> On Fri, Jan 08, 2010 at 11:54:24AM +0900, Tetsuo Handa wrote:
> > I'm experiencing file corruption problem.
> > Can somebody reproduce below result?
> >
> > My environment:
> > VMware Workstation 6.5.3 with 2CPUs / 512MB RAM.
> > ext3 filesystem ( /dev/sda1 ) mounted on / .
> >
> > 2.6.33-rc3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.33-rc3-ext3 )
> > 2.6.32.3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.32.3-ext3 )
> > 2.6.31.11 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.31.11-ext3 )
> > 2.6.30.10
> >
> > So far, I haven't succeeded to reproduce this problem for 2.6.29 and earlier.
> > Maybe this problem exists in only 2.6.30 and later.
>
> Isn't that when the default mount options changed from data=ordered to
> data=writeback?
Ah, indeed. 2.6.31 mounts data=writeback whereas 2.6.29 mounts data=ordered.

In my Ubuntu 9.10 environment, it is using data=writeback mode, and therefore
I got garbage data taken from other deleted files.

> You didn't fsync() it, so there is no reason for the kernel
> to have ever written it to disk. Therefore the result after powerfail
> is completely undefined - you data may be there, it may not...

I didn't call fsync(). Thus, I don't mind if the data I wrote is not written
to disk.

However, I feel something is very wrong because the file got data which I
didn't write. The file gets data from deleted files. Imagine that unprivileged
user can get the content of /etc/shadow if power failure occurred when the user
was running ./a.out .

The file should not get data from deleted files, but I can read the data from
deleted files by "cat /testfile". I feel something is very wrong.

Regards.

2010-01-08 15:19:23

by Dave Chinner

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

On Fri, Jan 08, 2010 at 10:15:10PM +0900, Tetsuo Handa wrote:
> Dave Chinner wrote:
> > On Fri, Jan 08, 2010 at 11:54:24AM +0900, Tetsuo Handa wrote:
> > > I'm experiencing file corruption problem.
> > > Can somebody reproduce below result?
> > >
> > > My environment:
> > > VMware Workstation 6.5.3 with 2CPUs / 512MB RAM.
> > > ext3 filesystem ( /dev/sda1 ) mounted on / .
> > >
> > > 2.6.33-rc3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.33-rc3-ext3 )
> > > 2.6.32.3 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.32.3-ext3 )
> > > 2.6.31.11 ( http://I-love.SAKURA.ne.jp/tmp/config-2.6.31.11-ext3 )
> > > 2.6.30.10
> > >
> > > So far, I haven't succeeded to reproduce this problem for 2.6.29 and earlier.
> > > Maybe this problem exists in only 2.6.30 and later.
> >
> > Isn't that when the default mount options changed from data=ordered to
> > data=writeback?
> Ah, indeed. 2.6.31 mounts data=writeback whereas 2.6.29 mounts data=ordered.
>
> In my Ubuntu 9.10 environment, it is using data=writeback mode, and therefore
> I got garbage data taken from other deleted files.
>
> > You didn't fsync() it, so there is no reason for the kernel
> > to have ever written it to disk. Therefore the result after powerfail
> > is completely undefined - you data may be there, it may not...
>
> I didn't call fsync(). Thus, I don't mind if the data I wrote is not written
> to disk.

Ok, I was making sure you weren't misunderstanding what the fflush()
is supposed to guarantee - many people do, but you're not one of
them :)

> However, I feel something is very wrong because the file got data which I
> didn't write. The file gets data from deleted files. Imagine that unprivileged
> user can get the content of /etc/shadow if power failure occurred when the user
> was running ./a.out .

Ah, so it was stale data you were seeing.

> The file should not get data from deleted files, but I can read the data from
> deleted files by "cat /testfile". I feel something is very wrong.

I agree that it is very wrong, but it's a known problem with writeback
mode in ext3:

http://thread.gmane.org/gmane.linux.kernel/818044/focus=819977

More info as to how this change came about and the proposed but not
yet realised fixes:

http://lwn.net/Articles/328363/

Cheers,

Dave.
--
Dave Chinner
[email protected]

2010-01-09 02:53:08

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

Dave Chinner wrote:
> I agree that it is very wrong, but it's a known problem with writeback
> mode in ext3:
>
> http://thread.gmane.org/gmane.linux.kernel/818044/focus=819977
>
> More info as to how this change came about and the proposed but not
> yet realised fixes:
>
> http://lwn.net/Articles/328363/

Thank you for the pointer.

Indeed, most Linux boxes are used by single user.
But implicitly importing other deleted file's data is still annoying
even if the box is used by only one user.

When I was trying to identify the steps to reproduce, I got ./a.out replaced
by the deleted .bash_history due to power failure. I executed ./a.out as root
without knowing that the file contains deleted .bash_history , and many
commands listed in deleted .bash_history are executed as root.
I thought my box was cracked and trojaned. :-(

2010-01-11 20:13:14

by Ric Wheeler

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

On 01/08/2010 09:53 PM, Tetsuo Handa wrote:
> Dave Chinner wrote:
>
>> I agree that it is very wrong, but it's a known problem with writeback
>> mode in ext3:
>>
>> http://thread.gmane.org/gmane.linux.kernel/818044/focus=819977
>>
>> More info as to how this change came about and the proposed but not
>> yet realised fixes:
>>
>> http://lwn.net/Articles/328363/
>>
> Thank you for the pointer.
>
> Indeed, most Linux boxes are used by single user.
> But implicitly importing other deleted file's data is still annoying
> even if the box is used by only one user.
>
> When I was trying to identify the steps to reproduce, I got ./a.out replaced
> by the deleted .bash_history due to power failure. I executed ./a.out as root
> without knowing that the file contains deleted .bash_history , and many
> commands listed in deleted .bash_history are executed as root.
> I thought my box was cracked and trojaned. :-(
>

Fedora and some other distributions changed the default back to data
ordered mode in order to avoid exactly this kind of mess. Even if you
are on a single user system, this behavior is certainly unexpected for
most users :-)

Ric

2010-01-15 20:01:49

by Pavel Machek

[permalink] [raw]
Subject: Re: [2.6.30 and later] file corruption on ext3 filesystem.

Hi!

>>> http://lwn.net/Articles/328363/
>>>
>> Thank you for the pointer.
>>
>> Indeed, most Linux boxes are used by single user.
>> But implicitly importing other deleted file's data is still annoying
>> even if the box is used by only one user.
>>
>> When I was trying to identify the steps to reproduce, I got ./a.out replaced
>> by the deleted .bash_history due to power failure. I executed ./a.out as root
>> without knowing that the file contains deleted .bash_history , and many
>> commands listed in deleted .bash_history are executed as root.
>> I thought my box was cracked and trojaned. :-(
>
> Fedora and some other distributions changed the default back to data
> ordered mode in order to avoid exactly this kind of mess. Even if you
> are on a single user system, this behavior is certainly unexpected for
> most users :-)

Also there's config option to get back the old behaviour.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html