Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp3722928ybc; Mon, 25 Nov 2019 20:08:57 -0800 (PST) X-Google-Smtp-Source: APXvYqwsKQeqRahiYu7PyuqGyVW63b0Nuf0KWMq5oQFJ5dC5mCZ+plpGm4aR3dKNIpJkicmdlQTb X-Received: by 2002:a50:ce47:: with SMTP id k7mr22681349edj.150.1574741337211; Mon, 25 Nov 2019 20:08:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574741337; cv=none; d=google.com; s=arc-20160816; b=cSv1ASlWxFcR+fsK54OM5XxvL33XGH2MmoSN5eM7h9PwN+1w0A4G005tW7zL6UJF8T FOqG/a0tPa8hCFMhph6DkJl/yfE9X3gJO1NEwoD/s3mSRfcAqDy92WaD2d0XDfN9YSQI 7U5saDIggUWNMdB7hub9aVyFAPyq9VTaNAVvUl5bjiMMuyWmxgPCUx3eQBmP9wZ/EqNX hm6bhn+0PV8TcIJze6BAWG0PkohYoanFEFFLjAtIA/q9Jnd9SfifAnUjnK2ABxT/C8bP ucNmYmKbU6/h6CAKlB/3FgYC0cY8zY+EIX5kkX8xC0qEQNzE05aXicTjEQy2wqzXujJp K/Kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=e6wMTLKnA+rNgtD/pp0DryO/NWlpo8/d9a9UXsGOjg4=; b=UQhsoVGjjWzT/wJj4wzAdWVSLP097cYHpqetRauGZTbtz0n07DpxjzbuGWwJG7Xplj gbnsX8bBYcZu3CroEr0/IqYAft8sgMbIdOOBkuw3hTr/oP+St9tlzgdK09GBGcqPawLZ tVXDFqhmqMLf62eZarZBoRzvQ8N0tjv0O8O3nZ/75MpLgheUPO0F0dlmDrRyNtECoDrO et8225TrjXM5OSwO4O9zgqr8Tx4CZ1Raix4XSD6B3tGPC68xRPZOhdYsbnlkBP5ogqd2 VorN1YO8ti2DFl8eSuquA5UkhEGzzxeK66giWnIRlYN8le1lKtj66LXFBsq/xXeqHvkH dNLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@javigon-com.20150623.gappssmtp.com header.s=20150623 header.b=N8k7cFOP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id ha8si6022063ejb.248.2019.11.25.20.08.33; Mon, 25 Nov 2019 20:08:57 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@javigon-com.20150623.gappssmtp.com header.s=20150623 header.b=N8k7cFOP; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727386AbfKZD5b (ORCPT + 99 others); Mon, 25 Nov 2019 22:57:31 -0500 Received: from mail-wr1-f68.google.com ([209.85.221.68]:37603 "EHLO mail-wr1-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726947AbfKZD5b (ORCPT ); Mon, 25 Nov 2019 22:57:31 -0500 Received: by mail-wr1-f68.google.com with SMTP id g7so8146wrw.4 for ; Mon, 25 Nov 2019 19:57:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=javigon-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=e6wMTLKnA+rNgtD/pp0DryO/NWlpo8/d9a9UXsGOjg4=; b=N8k7cFOPVjqX2XGPDgw9O2rte/reF7R/56Ns6wnvp9fJxDVw6Rid+J5wO21USAz+sY 0l9Ooms+vC9v+fUkgp1wqReUarFaRo3iZoUVbKbdCDB13AmDoSjv2GPie8BDbFmn1JeQ 70Iwob9y2qCHHmsaVBbRuDvXcGrYr7Xu6UVbFiJApu7HaredvFmVZqNqwWwG2ondOUbE kI6M+vS4q6wTwemZSa5XlLZ6cP8zUJ6tc/NpB2q7T+c2csy6ZJgSsVA6tJgCJwcoOSvN sK3I3AM3FzWg1lucCP2+ST88HvJQ4C5TuKoC7xgY2pL0+GJdAn9fVRQ8M+T/+gmhuMHj Jshw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=e6wMTLKnA+rNgtD/pp0DryO/NWlpo8/d9a9UXsGOjg4=; b=A6WQAaT64oDYPW19Z+QKw63b5l/xcWcOS3LqCVkDim1DIAQcBQkfRWxmcj39Fnvqhz 45GaZq6L0looPUeei3Yb4z+/3hASyH2l4TivgblzUOakq9jneH+ucjJdFGAcxLUl9Dr2 lbTneZ1dK4eKVA3+brTtuJ6PQ4vAHtsQ1xaD1Cl0B9CBQkv2anrdGICmLuLwysO5s2GQ BBV++walZyQ7OIY1Qjf8PT+hBFXxX1VKQ4D7E/8UiZE79QsqJlb168XxlPXOl85OdPgI l8uB6/oHuipe5QQgKoJ+iTQsXofDeo1XyklRVpr0dMog94m1aZSb5FJDH9eaVZLdv1Y/ Vpcw== X-Gm-Message-State: APjAAAXtXaEl6kK/FNYpTV3MxQ/x/rL+2HQNWOZDthvrgcp66ZroiMnv 4zsHvkwJdUwlhqM5tBluZcMzZQ== X-Received: by 2002:adf:f5cb:: with SMTP id k11mr29786781wrp.82.1574740648040; Mon, 25 Nov 2019 19:57:28 -0800 (PST) Received: from localhost (ip-5-186-122-168.cgn.fibianet.dk. [5.186.122.168]) by smtp.gmail.com with ESMTPSA id p25sm1470345wma.20.2019.11.25.19.57.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Nov 2019 19:57:27 -0800 (PST) Date: Tue, 26 Nov 2019 04:57:26 +0100 From: Javier =?utf-8?B?R29uesOhbGV6?= To: Damien Le Moal Cc: "jaegeuk@kernel.org" , "yuchao0@huawei.com" , "linux-f2fs-devel@lists.sourceforge.net" , "linux-kernel@vger.kernel.org" , Javier =?utf-8?B?R29uesOhbGV6?= Subject: Re: [PATCH] f2fs: disble physical prealloc in LSF mount Message-ID: <20191126035726.xj7pierxsck6adow@mpHalley> References: <20191122085952.12754-1-javier@javigon.com> <20191125190320.g7beal27nc5ubju7@mpHalley> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 26.11.2019 02:06, Damien Le Moal wrote: >On 2019/11/26 4:03, Javier González wrote: >> On 25.11.2019 00:48, Damien Le Moal wrote: >>> On 2019/11/22 18:00, Javier González wrote: >>>> From: Javier González >>>> >>>> Fix file system corruption when using LFS mount (e.g., in zoned >>>> devices). Seems like the fallback into buffered I/O creates an >>>> inconsistency if the application is assuming both read and write DIO. I >>>> can easily reproduce a corruption with a simple RocksDB test. >>>> >>>> Might be that the f2fs_forced_buffered_io path brings some problems too, >>>> but I have not seen other failures besides this one. >>>> >>>> Problem reproducible without a zoned block device, simply by forcing >>>> LFS mount: >>>> >>>> $ sudo mkfs.f2fs -f -m /dev/nvme0n1 >>>> $ sudo mount /dev/nvme0n1 /mnt/f2fs >>>> $ sudo /opt/rocksdb/db_bench --benchmarks=fillseq --use_existing_db=0 >>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>> --db=/mnt/f2fs --num=5000 --value_size=1048576 --verify_checksum=1 >>>> --block_size=65536 >>>> >>>> Note that the options that cause the problem are: >>>> --use_direct_reads=true --use_direct_io_for_flush_and_compaction=true >>>> >>>> Fixes: f9d6d0597698 ("f2fs: fix out-place-update DIO write") >>>> >>>> Signed-off-by: Javier González >>>> --- >>>> fs/f2fs/data.c | 3 --- >>>> 1 file changed, 3 deletions(-) >>>> >>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>> index 5755e897a5f0..b045dd6ab632 100644 >>>> --- a/fs/f2fs/data.c >>>> +++ b/fs/f2fs/data.c >>>> @@ -1081,9 +1081,6 @@ int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from) >>>> return err; >>>> } >>>> >>>> - if (direct_io && allow_outplace_dio(inode, iocb, from)) >>>> - return 0; >>> >>> Since for LFS mode, all DIOs can end up out of place, I think that it >>> may be better to change allow_outplace_dio() to always return true in >>> the case of LFS mode. So may be something like: >>> >>> static inline int allow_outplace_dio(struct inode *inode, >>> struct kiocb *iocb, struct iov_iter *iter) >>> { >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>> int rw = iov_iter_rw(iter); >>> >>> return test_opt(sbi, LFS) || >>> (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); >>> } >>> >>> instead of the original: >>> >>> static inline int allow_outplace_dio(struct inode *inode, >>> struct kiocb *iocb, struct iov_iter *iter) >>> { >>> struct f2fs_sb_info *sbi = F2FS_I_SB(inode); >>> int rw = iov_iter_rw(iter); >>> >>> return (test_opt(sbi, LFS) && (rw == WRITE) && >>> !block_unaligned_IO(inode, iocb, iter)); >>> } >>> >>> Thoughts ? >>> >> >> I see what you mean and it makes sense. However, the problem I am seeing >> occurs when allow_outplace_dio() returns true, as this is what creates >> the inconsistency between the write being buffered and the read being >> DIO. > >But if the write is switched to buffered, the DIO read should use the >buffered path too, no ? Since this is all happening under VFS, the >generic DIO read path will not ensure that the buffered writes are >flushed to disk before issuing the direct read, I think. So that would >explain your data corruption, i.e. you are reading stale data on the >device before the buffered writes make it to the media. > As far as I can see, the read is always sent DIO, so yes, I also believe that we are reading stale data. This is why the corruption is not seen if preventing allow_outplace_dio() from sending the write to the buffered path. What surprises me is that this is very easy to trigger (see commit), so I assume you must have seen this with SMR in the past. Does it make sense to leave the LFS check out of the allow_outplace_dio()? Or in other words, is there a hard requirement for writes to take this path on a zoned device that I am not seeing? Something like: static inline int allow_outplace_dio(struct inode *inode, struct kiocb *iocb, struct iov_iter *iter) { struct f2fs_sb_info *sbi = F2FS_I_SB(inode); int rw = iov_iter_rw(iter); return (rw == WRITE && !block_unaligned_IO(inode, iocb, iter)); } Thanks, Javier