2010-01-29 15:03:48

by Giel de Nijs

[permalink] [raw]
Subject: Possible ext4 data corruption with large files and async I/O

Dear ext4 devs,

Today I hit a situation where seemingly blocks did not get written to
disk. I've narrowed it down to the following test case.

Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
i7 920 and a Core2 Q6600, I executed the following steps:

- create a file
- with kernel async i/o, write a 512kb (haven't tried other sizes) block
to an offset >4GB, effectively creating a large sparse file
- again with async i/o, write a 512kb block to an offset smaller than
the previous write, but >4GB
- wait for the kernel async i/o to tell you the writes have succeeded

Now, looking at the file, the second write never seems to have happened.
When doing this on the same machines on ext3, the behavior is as expected.

As far as I can tell (from the bigger program that triggered this), all
writes >4GB but < EOF to a sparse file with async i/o aren't executed.
When creating a large file first (i.e., with dd), everything does work
as expected.

Attached is some C code that triggers this bug for me.

If you need more information or want me to test some more things, please
do ask.

Thanks,
Giel de Nijs
VectorWise


Attachments:
ext4_bug_2.c (3.59 kB)

2010-01-29 15:30:22

by Nick Dokos

[permalink] [raw]
Subject: Re: Possible ext4 data corruption with large files and async I/O

>
> Dear ext4 devs,
>
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
>
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
>
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
>
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
>
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
>
> Attached is some C code that triggers this bug for me.
>
> If you need more information or want me to test some more things, please
> do ask.
>

I ran your program on FC-11 with a 2.6.33-rc4 upstream kernel: it worked fine.
Both dd's gave the expected output.

Thanks,
Nick

Transcript:

[email protected]:~/src/ext4/giel-de-nijs# ./a.out
opening file ext4_bug.testfile
submitting write of 524288 bytes at offset 6442450944
waiting for write to be finished
got 1 events
written 524288 bytes
submitting write of 524288 bytes at offset 5368709120
waiting for write to be finished
got 1 events
written 524288 bytes
[email protected]:~/src/ext4/giel-de-nijs# dd if=ext4_bug.testfile bs=512k count=1 skip=10K|hexdump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.0045471 s, 115 MB/s
0080000
[email protected]:~/src/ext4/giel-de-nijs# dd if=ext4_bug.testfile bs=512k count=1 skip=12K|hexdump
0000000 ffff ffff ffff ffff ffff ffff ffff ffff
*
1+0 records in
1+0 records out
524288 bytes (524 kB) copied, 0.00474075 s, 111 MB/s
0080000

2010-01-29 15:30:57

by Eric Sandeen

[permalink] [raw]
Subject: Re: Possible ext4 data corruption with large files and async I/O

Giel de Nijs wrote:
> Dear ext4 devs,
>
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
>
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
>
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
>
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
>
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
>
> Attached is some C code that triggers this bug for me.
>
> If you need more information or want me to test some more things, please
> do ask.

Thanks, I can reproduce this as well - and yep works ok on ext3 & xfs,
so looks like an ext4 bug indeed. I'll look into it.

-Eric

> Thanks,
> Giel de Nijs
> VectorWise
>


2010-01-29 15:49:28

by Nick Dokos

[permalink] [raw]
Subject: Re: Possible ext4 data corruption with large files and async I/O

> >
> > Dear ext4 devs,
> >
> > Today I hit a situation where seemingly blocks did not get written to
> > disk. I've narrowed it down to the following test case.
> >
> > Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> > i7 920 and a Core2 Q6600, I executed the following steps:
> >
> > - create a file
> > - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> > to an offset >4GB, effectively creating a large sparse file
> > - again with async i/o, write a 512kb block to an offset smaller than
> > the previous write, but >4GB
> > - wait for the kernel async i/o to tell you the writes have succeeded
> >
> > Now, looking at the file, the second write never seems to have happened.
> > When doing this on the same machines on ext3, the behavior is as expected.
> >
> > As far as I can tell (from the bigger program that triggered this), all
> > writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> > When creating a large file first (i.e., with dd), everything does work
> > as expected.
> >
> > Attached is some C code that triggers this bug for me.
> >
> > If you need more information or want me to test some more things, please
> > do ask.
> >
>
> I ran your program on FC-11 with a 2.6.33-rc4 upstream kernel: it worked fine.
> Both dd's gave the expected output.
>

Scratch that: I goofed. I can reproduce it too.

Sorry for the confusion.

Nick

2010-01-29 18:18:17

by Eric Sandeen

[permalink] [raw]
Subject: Re: Possible ext4 data corruption with large files and async I/O

Giel de Nijs wrote:
> Dear ext4 devs,
>
> Today I hit a situation where seemingly blocks did not get written to
> disk. I've narrowed it down to the following test case.
>
> Running Fedora Core 12 with kernel 2.6.31.9-174.fc12.x86_64, both on an
> i7 920 and a Core2 Q6600, I executed the following steps:
>
> - create a file
> - with kernel async i/o, write a 512kb (haven't tried other sizes) block
> to an offset >4GB, effectively creating a large sparse file
> - again with async i/o, write a 512kb block to an offset smaller than
> the previous write, but >4GB
> - wait for the kernel async i/o to tell you the writes have succeeded
>
> Now, looking at the file, the second write never seems to have happened.
> When doing this on the same machines on ext3, the behavior is as expected.
>
> As far as I can tell (from the bigger program that triggered this), all
> writes >4GB but < EOF to a sparse file with async i/o aren't executed.
> When creating a large file first (i.e., with dd), everything does work
> as expected.
>
> Attached is some C code that triggers this bug for me.
>
> If you need more information or want me to test some more things, please
> do ask.
>
> Thanks,
> Giel de Nijs
> VectorWise
>

Ok, got it, will send a patch - thanks.

-Eric