Hi,
I was writing a small fio job file to do writes and read verifies on a
device. It forks 32 processes, each writing randomly to 4 files with a
block size between 4k and 16k. When it has written 1024 of those blocks,
it'll verify the oldest 512 of them. Each block is checksummed for every
512b. It uses libaio and O_DIRECT.
It works on ext2 and btrfs. I haven't run it to completion yet, but they
survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
before this triggers:
Bad verify header 0 at 10137600
fio: pid=9943, err=84/file:io_u.c:1212, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character
writers: (groupid=0, jobs=32): err=84 (file:io_u.c:1212, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=9943
which tells us that where we expected to find the correct verify magic
in the header, it was all zeroes. The job file used is below, and to
reproduce you want to use the latest fio (1.40) since some earlier
versions don't do verify_interval properly for non-pattern verifies. You
can get fio here:
http://brick.kernel.dk/snaps/fio-1.40.tar.gz
or from git at:
git://git.kernel.dk/fio.git
The kernel used is 2.6.35-rc3 and I ran this on a raid0 that had 8 SSD
drives.
--- snip job file ---
[global]
direct=1
group_reporting=1
exitall
runtime=4h
time_based=1
# writers, will repeatedly randomly write and verify data
[writers]
rw=randwrite
bsrange=4k-16k
ioengine=libaio
iodepth=4
directory=/data
verify=crc32c
verify_backlog=1024
verify_backlog_batch=512
verify_interval=512
size=512m
nrfiles=4
filesize=64m-256m
numjobs=32
create_serialize=0
--- snip job file ---
--
Jens Axboe
Jens Axboe wrote:
> Hi,
>
> I was writing a small fio job file to do writes and read verifies on a
> device. It forks 32 processes, each writing randomly to 4 files with a
> block size between 4k and 16k. When it has written 1024 of those blocks,
> it'll verify the oldest 512 of them. Each block is checksummed for every
> 512b. It uses libaio and O_DIRECT.
>
> It works on ext2 and btrfs. I haven't run it to completion yet, but they
> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
> before this triggers:
Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
(and I'm not sure about btrfs in that regard), ext4 may be most similar
to xfs's behavior on the test ... wondering how it fares.
Thanks,
-Eric
Eric Sandeen wrote:
> Jens Axboe wrote:
>> Hi,
>>
>> I was writing a small fio job file to do writes and read verifies on a
>> device. It forks 32 processes, each writing randomly to 4 files with a
>> block size between 4k and 16k. When it has written 1024 of those blocks,
>> it'll verify the oldest 512 of them. Each block is checksummed for every
>> 512b. It uses libaio and O_DIRECT.
>>
>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>> before this triggers:
>
> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
> (and I'm not sure about btrfs in that regard), ext4 may be most similar
> to xfs's behavior on the test ... wondering how it fares.
>
> Thanks,
> -Eric
Actually mingming had a patch for direct-io.c which may be related, I'll
test that out.
-Eric
On 18/06/10 16.59, Eric Sandeen wrote:
> Eric Sandeen wrote:
>> Jens Axboe wrote:
>>> Hi,
>>>
>>> I was writing a small fio job file to do writes and read verifies on a
>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>> 512b. It uses libaio and O_DIRECT.
>>>
>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>> before this triggers:
>>
>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>> to xfs's behavior on the test ... wondering how it fares.
>>
>> Thanks,
>> -Eric
>
> Actually mingming had a patch for direct-io.c which may be related, I'll
> test that out.
OK, I'll try XFS tonight as well.
--
Jens Axboe
Jens Axboe wrote:
> On 18/06/10 16.59, Eric Sandeen wrote:
>
>> Eric Sandeen wrote:
>>
>>> Jens Axboe wrote:
>>>
>>>> Hi,
>>>>
>>>> I was writing a small fio job file to do writes and read verifies on a
>>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>>> 512b. It uses libaio and O_DIRECT.
>>>>
>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>>> before this triggers:
>>>>
>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>>> to xfs's behavior on the test ... wondering how it fares.
>>>
>>> Thanks,
>>> -Eric
>>>
>> Actually mingming had a patch for direct-io.c which may be related, I'll
>> test that out.
>>
>
> OK, I'll try XFS tonight as well.
>
>
>
I haven't been able to reproduce it on ext4 here, yet.
FWIW here's the patch from mingming:
When unaligned DIO writes, skip zero out the block if the buffer is marked
unwritten. That means there is an asynconous direct IO (append or fill the hole)
still pending.
Signed-off-by: Mingming Cao <[email protected]>
---
fs/direct-io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Index: linux-git/fs/direct-io.c
===================================================================
--- linux-git.orig/fs/direct-io.c 2010-05-07 15:42:22.855033403 -0700
+++ linux-git/fs/direct-io.c 2010-05-07 15:44:17.695007770 -0700
@@ -740,7 +740,8 @@
struct page *page;
dio->start_zero_done = 1;
- if (!dio->blkfactor || !buffer_new(&dio->map_bh))
+ if (!dio->blkfactor || !buffer_new(&dio->map_bh)
+ || buffer_unwritten(&dio->map_bh))
return;
dio_blocks_per_fs_block = 1 << dio->blkfactor;
On 2010-06-18 17:28, Eric Sandeen wrote:
> Jens Axboe wrote:
>> On 18/06/10 16.59, Eric Sandeen wrote:
>>
>>> Eric Sandeen wrote:
>>>
>>>> Jens Axboe wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I was writing a small fio job file to do writes and read verifies on a
>>>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>>>> 512b. It uses libaio and O_DIRECT.
>>>>>
>>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>>>> before this triggers:
>>>>>
>>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>>>> to xfs's behavior on the test ... wondering how it fares.
>>>>
>>>> Thanks,
>>>> -Eric
>>>>
>>> Actually mingming had a patch for direct-io.c which may be related, I'll
>>> test that out.
>>>
>>
>> OK, I'll try XFS tonight as well.
>>
>>
>>
> I haven't been able to reproduce it on ext4 here, yet.
>
> FWIW here's the patch from mingming:
>
> When unaligned DIO writes, skip zero out the block if the buffer is marked
> unwritten. That means there is an asynconous direct IO (append or fill the hole)
> still pending.
>
> Signed-off-by: Mingming Cao <[email protected]>
> ---
> fs/direct-io.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> Index: linux-git/fs/direct-io.c
> ===================================================================
> --- linux-git.orig/fs/direct-io.c 2010-05-07 15:42:22.855033403 -0700
> +++ linux-git/fs/direct-io.c 2010-05-07 15:44:17.695007770 -0700
> @@ -740,7 +740,8 @@
> struct page *page;
>
> dio->start_zero_done = 1;
> - if (!dio->blkfactor || !buffer_new(&dio->map_bh))
> + if (!dio->blkfactor || !buffer_new(&dio->map_bh)
> + || buffer_unwritten(&dio->map_bh))
> return;
>
> dio_blocks_per_fs_block = 1 << dio->blkfactor;
>
>
What is this patch against?
--
Jens Axboe
On 2010-06-18 17:13, Jens Axboe wrote:
> On 18/06/10 16.59, Eric Sandeen wrote:
>> Eric Sandeen wrote:
>>> Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> I was writing a small fio job file to do writes and read verifies on a
>>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>>> 512b. It uses libaio and O_DIRECT.
>>>>
>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>>> before this triggers:
>>>
>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>>> to xfs's behavior on the test ... wondering how it fares.
>>>
>>> Thanks,
>>> -Eric
>>
>> Actually mingming had a patch for direct-io.c which may be related, I'll
>> test that out.
>
> OK, I'll try XFS tonight as well.
XFS fails too.
--
Jens Axboe
Jens Axboe wrote:
> On 2010-06-18 17:28, Eric Sandeen wrote:
>> Jens Axboe wrote:
>>> On 18/06/10 16.59, Eric Sandeen wrote:
>>>
>>>> Eric Sandeen wrote:
>>>>
>>>>> Jens Axboe wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I was writing a small fio job file to do writes and read verifies on a
>>>>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>>>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>>>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>>>>> 512b. It uses libaio and O_DIRECT.
>>>>>>
>>>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>>>>> before this triggers:
>>>>>>
>>>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>>>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>>>>> to xfs's behavior on the test ... wondering how it fares.
>>>>>
>>>>> Thanks,
>>>>> -Eric
>>>>>
>>>> Actually mingming had a patch for direct-io.c which may be related, I'll
>>>> test that out.
>>>>
>>> OK, I'll try XFS tonight as well.
>>>
>>>
>>>
>> I haven't been able to reproduce it on ext4 here, yet.
>>
>> FWIW here's the patch from mingming:
>>
>> When unaligned DIO writes, skip zero out the block if the buffer is marked
>> unwritten. That means there is an asynconous direct IO (append or fill the hole)
>> still pending.
>>
>> Signed-off-by: Mingming Cao <[email protected]>
>> ---
>> fs/direct-io.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> Index: linux-git/fs/direct-io.c
>> ===================================================================
>> --- linux-git.orig/fs/direct-io.c 2010-05-07 15:42:22.855033403 -0700
>> +++ linux-git/fs/direct-io.c 2010-05-07 15:44:17.695007770 -0700
>> @@ -740,7 +740,8 @@
>> struct page *page;
>>
>> dio->start_zero_done = 1;
>> - if (!dio->blkfactor || !buffer_new(&dio->map_bh))
>> + if (!dio->blkfactor || !buffer_new(&dio->map_bh)
>> + || buffer_unwritten(&dio->map_bh))
>> return;
>>
>> dio_blocks_per_fs_block = 1 << dio->blkfactor;
>>
>>
>
> What is this patch against?
>
Applied to 2.6.32, seems to apply upstream as well.
It hits dio_zero-block()
-Eric
On 2010-06-18 20:04, Eric Sandeen wrote:
> Jens Axboe wrote:
>> On 2010-06-18 17:28, Eric Sandeen wrote:
>>> Jens Axboe wrote:
>>>> On 18/06/10 16.59, Eric Sandeen wrote:
>>>>
>>>>> Eric Sandeen wrote:
>>>>>
>>>>>> Jens Axboe wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was writing a small fio job file to do writes and read verifies on a
>>>>>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>>>>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>>>>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>>>>>> 512b. It uses libaio and O_DIRECT.
>>>>>>>
>>>>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>>>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>>>>>> before this triggers:
>>>>>>>
>>>>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>>>>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>>>>>> to xfs's behavior on the test ... wondering how it fares.
>>>>>>
>>>>>> Thanks,
>>>>>> -Eric
>>>>>>
>>>>> Actually mingming had a patch for direct-io.c which may be related, I'll
>>>>> test that out.
>>>>>
>>>> OK, I'll try XFS tonight as well.
>>>>
>>>>
>>>>
>>> I haven't been able to reproduce it on ext4 here, yet.
>>>
>>> FWIW here's the patch from mingming:
>>>
>>> When unaligned DIO writes, skip zero out the block if the buffer is marked
>>> unwritten. That means there is an asynconous direct IO (append or fill the hole)
>>> still pending.
>>>
>>> Signed-off-by: Mingming Cao <[email protected]>
>>> ---
>>> fs/direct-io.c | 3 ++-
>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> Index: linux-git/fs/direct-io.c
>>> ===================================================================
>>> --- linux-git.orig/fs/direct-io.c 2010-05-07 15:42:22.855033403 -0700
>>> +++ linux-git/fs/direct-io.c 2010-05-07 15:44:17.695007770 -0700
>>> @@ -740,7 +740,8 @@
>>> struct page *page;
>>>
>>> dio->start_zero_done = 1;
>>> - if (!dio->blkfactor || !buffer_new(&dio->map_bh))
>>> + if (!dio->blkfactor || !buffer_new(&dio->map_bh)
>>> + || buffer_unwritten(&dio->map_bh))
>>> return;
>>>
>>> dio_blocks_per_fs_block = 1 << dio->blkfactor;
>>>
>>>
>>
>> What is this patch against?
>>
>
> Applied to 2.6.32, seems to apply upstream as well.
>
> It hits dio_zero-block()
Irk indeed, I am blind. The patch does not fix it.
--
Jens Axboe
Hi,
I want like to add that we have a similar testcase which probably triggers
much faster than the testcase of Jens, see here:
https://bugzilla.kernel.org/show_bug.cgi?id=16165
We believe that this bug is responsible for data corruption of VirtualBox
disk images located on an ext4 file system. Please let me know how we can
help you debugging this issue.
Kind regards,
Frank
--
Dr.-Ing. Frank Mehnert
Sitz der Gesellschaft:
Sun Microsystems GmbH, Sonnenallee 1, 85551 Kirchheim-Heimstetten
Amtsgericht M?nchen: HRB 161028
Gesch?ftsf?hrer: J?rgen Kunz
On 2010-06-18 20:14, Jens Axboe wrote:
> On 2010-06-18 20:04, Eric Sandeen wrote:
>> Jens Axboe wrote:
>>> On 2010-06-18 17:28, Eric Sandeen wrote:
>>>> Jens Axboe wrote:
>>>>> On 18/06/10 16.59, Eric Sandeen wrote:
>>>>>
>>>>>> Eric Sandeen wrote:
>>>>>>
>>>>>>> Jens Axboe wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I was writing a small fio job file to do writes and read verifies on a
>>>>>>>> device. It forks 32 processes, each writing randomly to 4 files with a
>>>>>>>> block size between 4k and 16k. When it has written 1024 of those blocks,
>>>>>>>> it'll verify the oldest 512 of them. Each block is checksummed for every
>>>>>>>> 512b. It uses libaio and O_DIRECT.
>>>>>>>>
>>>>>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>>>>>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>>>>>>>> before this triggers:
>>>>>>>>
>>>>>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole,
>>>>>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar
>>>>>>> to xfs's behavior on the test ... wondering how it fares.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> -Eric
>>>>>>>
>>>>>> Actually mingming had a patch for direct-io.c which may be related, I'll
>>>>>> test that out.
>>>>>>
>>>>> OK, I'll try XFS tonight as well.
>>>>>
>>>>>
>>>>>
>>>> I haven't been able to reproduce it on ext4 here, yet.
>>>>
>>>> FWIW here's the patch from mingming:
>>>>
>>>> When unaligned DIO writes, skip zero out the block if the buffer is marked
>>>> unwritten. That means there is an asynconous direct IO (append or fill the hole)
>>>> still pending.
>>>>
>>>> Signed-off-by: Mingming Cao <[email protected]>
>>>> ---
>>>> fs/direct-io.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> Index: linux-git/fs/direct-io.c
>>>> ===================================================================
>>>> --- linux-git.orig/fs/direct-io.c 2010-05-07 15:42:22.855033403 -0700
>>>> +++ linux-git/fs/direct-io.c 2010-05-07 15:44:17.695007770 -0700
>>>> @@ -740,7 +740,8 @@
>>>> struct page *page;
>>>>
>>>> dio->start_zero_done = 1;
>>>> - if (!dio->blkfactor || !buffer_new(&dio->map_bh))
>>>> + if (!dio->blkfactor || !buffer_new(&dio->map_bh)
>>>> + || buffer_unwritten(&dio->map_bh))
>>>> return;
>>>>
>>>> dio_blocks_per_fs_block = 1 << dio->blkfactor;
>>>>
>>>>
>>>
>>> What is this patch against?
>>>
>>
>> Applied to 2.6.32, seems to apply upstream as well.
>>
>> It hits dio_zero-block()
>
> Irk indeed, I am blind. The patch does not fix it.
So just to confirm that this isn't a new regression, 2.6.34 fails in the
same way. If I change the test to make the random writes overwrite
existing blocks instead of filling holes, then there are no problems
either.
--
Jens Axboe
Jens Axboe wrote:
> Hi,
>
> I was writing a small fio job file to do writes and read verifies on a
> device. It forks 32 processes, each writing randomly to 4 files with a
> block size between 4k and 16k. When it has written 1024 of those blocks,
> it'll verify the oldest 512 of them. Each block is checksummed for every
> 512b. It uses libaio and O_DIRECT.
>
> It works on ext2 and btrfs. I haven't run it to completion yet, but they
> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
> before this triggers:
>
> Bad verify header 0 at 10137600
> fio: pid=9943, err=84/file:io_u.c:1212, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character
>
> writers: (groupid=0, jobs=32): err=84 (file:io_u.c:1212, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=9943
FYI:
I asked Jens to test hch's and Jiaying's aio completion patches with this,
and apparently those fixed this problem for him.
-Eric
> which tells us that where we expected to find the correct verify magic
> in the header, it was all zeroes. The job file used is below, and to
> reproduce you want to use the latest fio (1.40) since some earlier
> versions don't do verify_interval properly for non-pattern verifies. You
> can get fio here:
>
> http://brick.kernel.dk/snaps/fio-1.40.tar.gz
>
> or from git at:
>
> git://git.kernel.dk/fio.git
>
> The kernel used is 2.6.35-rc3 and I ran this on a raid0 that had 8 SSD
> drives.
>
> --- snip job file ---
>
> [global]
> direct=1
> group_reporting=1
> exitall
> runtime=4h
> time_based=1
>
> # writers, will repeatedly randomly write and verify data
> [writers]
> rw=randwrite
> bsrange=4k-16k
> ioengine=libaio
> iodepth=4
> directory=/data
> verify=crc32c
> verify_backlog=1024
> verify_backlog_batch=512
> verify_interval=512
> size=512m
> nrfiles=4
> filesize=64m-256m
> numjobs=32
> create_serialize=0
>
> --- snip job file ---
>
On 07/07/10 16.26, Eric Sandeen wrote:
> Jens Axboe wrote:
>> Hi,
>>
>> I was writing a small fio job file to do writes and read verifies on a
>> device. It forks 32 processes, each writing randomly to 4 files with a
>> block size between 4k and 16k. When it has written 1024 of those blocks,
>> it'll verify the oldest 512 of them. Each block is checksummed for every
>> 512b. It uses libaio and O_DIRECT.
>>
>> It works on ext2 and btrfs. I haven't run it to completion yet, but they
>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes
>> before this triggers:
>>
>> Bad verify header 0 at 10137600
>> fio: pid=9943, err=84/file:io_u.c:1212, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character
>>
>> writers: (groupid=0, jobs=32): err=84 (file:io_u.c:1212, func=io_u_queued_complete, error=Invalid or incomplete multibyte or wide character): pid=9943
>
> FYI:
>
> I asked Jens to test hch's and Jiaying's aio completion patches with this,
> and apparently those fixed this problem for him.
At least for a shorter run, but long enough that all the holes should
have been filled at this point. So it at least fixes my test case.
I can try and expand the run a bit if there's any interest in that,
and see if that still verifies correctly.
--
Jens Axboe