From: Jens Axboe Subject: Re: fio test triggering bad data on ext4 Date: Fri, 18 Jun 2010 20:14:46 +0200 Message-ID: <4C1BB796.3020907@fusionio.com> References: <4C1B292C.2080205@fusionio.com> <4C1B7C73.505@redhat.com> <4C1B89C1.6090408@redhat.com> <4C1B8D1F.3020002@fusionio.com> <4C1B90AE.1050703@redhat.com> <4C1BADA1.5090705@fusionio.com> <4C1BB52F.6040609@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Cc: "tytso@mit.edu" , "adilger@sun.com" , "linux-ext4@vger.kernel.org" To: Eric Sandeen Return-path: Received: from 0122700014.0.fullrate.dk ([95.166.99.235]:42266 "EHLO kernel.dk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755359Ab0FRSOr (ORCPT ); Fri, 18 Jun 2010 14:14:47 -0400 In-Reply-To: <4C1BB52F.6040609@redhat.com> Sender: linux-ext4-owner@vger.kernel.org List-ID: On 2010-06-18 20:04, Eric Sandeen wrote: > Jens Axboe wrote: >> On 2010-06-18 17:28, Eric Sandeen wrote: >>> Jens Axboe wrote: >>>> On 18/06/10 16.59, Eric Sandeen wrote: >>>> >>>>> Eric Sandeen wrote: >>>>> >>>>>> Jens Axboe wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I was writing a small fio job file to do writes and read verifies on a >>>>>>> device. It forks 32 processes, each writing randomly to 4 files with a >>>>>>> block size between 4k and 16k. When it has written 1024 of those blocks, >>>>>>> it'll verify the oldest 512 of them. Each block is checksummed for every >>>>>>> 512b. It uses libaio and O_DIRECT. >>>>>>> >>>>>>> It works on ext2 and btrfs. I haven't run it to completion yet, but they >>>>>>> survive 15-20 minutes just fine. ext4 doesn't even go a full minutes >>>>>>> before this triggers: >>>>>>> >>>>>> Jens, can you try XFS too? Since ext3 can't do direct IO to a hole, >>>>>> (and I'm not sure about btrfs in that regard), ext4 may be most similar >>>>>> to xfs's behavior on the test ... wondering how it fares. >>>>>> >>>>>> Thanks, >>>>>> -Eric >>>>>> >>>>> Actually mingming had a patch for direct-io.c which may be related, I'll >>>>> test that out. >>>>> >>>> OK, I'll try XFS tonight as well. >>>> >>>> >>>> >>> I haven't been able to reproduce it on ext4 here, yet. >>> >>> FWIW here's the patch from mingming: >>> >>> When unaligned DIO writes, skip zero out the block if the buffer is marked >>> unwritten. That means there is an asynconous direct IO (append or fill the hole) >>> still pending. >>> >>> Signed-off-by: Mingming Cao >>> --- >>> fs/direct-io.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> Index: linux-git/fs/direct-io.c >>> =================================================================== >>> --- linux-git.orig/fs/direct-io.c 2010-05-07 15:42:22.855033403 -0700 >>> +++ linux-git/fs/direct-io.c 2010-05-07 15:44:17.695007770 -0700 >>> @@ -740,7 +740,8 @@ >>> struct page *page; >>> >>> dio->start_zero_done = 1; >>> - if (!dio->blkfactor || !buffer_new(&dio->map_bh)) >>> + if (!dio->blkfactor || !buffer_new(&dio->map_bh) >>> + || buffer_unwritten(&dio->map_bh)) >>> return; >>> >>> dio_blocks_per_fs_block = 1 << dio->blkfactor; >>> >>> >> >> What is this patch against? >> > > Applied to 2.6.32, seems to apply upstream as well. > > It hits dio_zero-block() Irk indeed, I am blind. The patch does not fix it. -- Jens Axboe