Received: by 2002:a9a:4c47:0:b029:116:c383:538 with SMTP id u7csp1153190lko; Tue, 13 Jul 2021 18:16:30 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAF8+aBOuxXWa+Timz4AiS7RcdhdbVR8oqvHSLJOOFBAKWRKafC6mJ9cNYgGsrJgL0cwGk X-Received: by 2002:a5d:8916:: with SMTP id b22mr5467071ion.108.1626225390708; Tue, 13 Jul 2021 18:16:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1626225390; cv=none; d=google.com; s=arc-20160816; b=vssreWMF1aJYFtpDDCx6Lb+W/pdN8OURodhndbFsN5ZxYajY7mpx4aHuw/CfTrQEhP VyDAj6Ps39ByUl2g1t4/XUi3mp3pCmJugLYCiCC/mHGST8MkcKnnGXZ8vRabUUshc+Gu MBEqKYQTPotyzTc16Ct8CO3Tk0/6CfB4Iatc3FPWH3ugY/JaRqu8Tt3b5Y1mzJW5VlG4 /6UaD/qEU9BnfIMrNjpEVNsC3CiI8t4tAa/DhzExYTtDTrhBO20DP/thd49m8dIUFZhP DqPfgbGHUVI4OWVHEI7cLdpBa92a/DhCN5Itbqg0DMYIQactzMk4BY/5LQEvxl8H2TIP FkAg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=+FQUzSX6uEN1GBcuS2PXxKX2oMs+42FIX3Dey91ua2A=; b=yp1NWIR52iW6XHVDgqS/lnv0ropbd7VNEVunFLQnO+2hGYOvMvBus6A2EA+NIgudaI zPWFOe2eOgbHgrcN0qjZ809418lX5zGQJgjISOuQ0RQX/qjsa1jGroey/DR/MpBRJEBN he4pEB1sXMqo0xA+Eiffak41G0mo9fwNbtSrfutla+2YgoIcw5HHyTE3XsmYayJ/cQng x5JagBaIiY4cSCErWJmMEmuCSFptUNtErUj+ancK9W5s7YLWTlojFwiDGKQk01TOHwjx iQzk7+RtCZmzqxmxnA5HySmy5jCgpspYu1xpePUzDnEeLHQl0nqe3B6OPj3vH+uluC62 n9jA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SzQb8367; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g13si521791ilf.35.2021.07.13.18.16.19; Tue, 13 Jul 2021 18:16:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SzQb8367; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237279AbhGNBS0 (ORCPT + 99 others); Tue, 13 Jul 2021 21:18:26 -0400 Received: from mail.kernel.org ([198.145.29.99]:36706 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237198AbhGNBS0 (ORCPT ); Tue, 13 Jul 2021 21:18:26 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id 1193061289; Wed, 14 Jul 2021 01:15:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1626225335; bh=FDjgY3JvycVwM0FY/fcoJONFmmXrBsv7R5u9GjJg09M=; h=Subject:To:Cc:References:From:Date:In-Reply-To:From; b=SzQb8367F+P4JVu2MdXPSwya4uSaAFM4HArQhNenMLcjRSVMjz495QsHZnxd7MeOE W+YGlLRZvT42ixtt4+wUV0opn0C28rdXFBDHS1ZDlXekycpI99Gohi4ZETNZTPRBlG 7iiStPcIjjFtIIBpktOrWTxZguj4ASCoLXKvx/37nHEDlfXd0LR3Enxs2l5YrTUnTf qeP3qine8TdhlZOKC0NAHQ0x4B2MhsKudwMVKQRNqGoYy1VmtVHDQmD/BuPophQwXP MbbfcemtzCt4qtntKc+cgVNwjxUELplSzz31vxDTLT1FPd9DjZSTjbxkrgpwyEWA0q TQbpFjqNLRqOQ== Subject: Re: [PATCH v2 RFC] f2fs: fix to force keeping write barrier for strict fsync mode To: Jaegeuk Kim Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org References: <20210601101024.119356-1-yuchao0@huawei.com> <648a96f7-2c83-e9ed-0cbd-4ee8e4797724@kernel.org> <55e069f7-662d-630c-1201-d0163b38bc17@kernel.org> From: Chao Yu Message-ID: <8f8d5645-9860-3e16-a09d-1a988ca6be72@kernel.org> Date: Wed, 14 Jul 2021 09:15:33 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2021/7/14 7:34, Jaegeuk Kim wrote: > On 07/13, Chao Yu wrote: >> On 2021/7/8 1:48, Jaegeuk Kim wrote: >>> On 07/02, Chao Yu wrote: >>>> On 2021/7/2 9:32, Jaegeuk Kim wrote: >>>>> On 07/02, Chao Yu wrote: >>>>>> On 2021/7/2 1:10, Jaegeuk Kim wrote: >>>>>>> On 06/01, Chao Yu wrote: >>>>>>>> [1] https://www.mail-archive.com/linux-f2fs-devel@lists.sourceforge.net/msg15126.html >>>>>>>> >>>>>>>> As [1] reported, if lower device doesn't support write barrier, in below >>>>>>>> case: >>>>>>>> >>>>>>>> - write page #0; persist >>>>>>>> - overwrite page #0 >>>>>>>> - fsync >>>>>>>> - write data page #0 OPU into device's cache >>>>>>>> - write inode page into device's cache >>>>>>>> - issue flush >>>>>>> >>>>>>> Well, we have preflush for node writes, so I don't think this is the case. >>>>>>> >>>>>>> fio.op_flags |= REQ_PREFLUSH | REQ_FUA; >>>>>> >>>>>> This is only used for atomic write case, right? >>>>>> >>>>>> I mean the common case which is called from f2fs_issue_flush() in >>>>>> f2fs_do_sync_file(). >>>>> >>>>> How about adding PREFLUSH when writing node blocks aligned to the above set? >>>> >>>> You mean implementation like v1 as below? >>>> >>>> https://lore.kernel.org/linux-f2fs-devel/20200120100045.70210-1-yuchao0@huawei.com/ >>> >>> Yea, I think so. :P >> >> I prefer v2, we may have several schemes to improve performance with v2, e.g. >> - use inplace IO to avoid newly added preflush >> - use flush_merge option to avoid redundant preflush >> - if lower device supports barrier IO, we can avoid newly added preflush > > Doesn't v2 give one more flush than v1? Why do you want to take worse one and FUA implies an extra preflush command or similar mechanism in lower device to keep data in bio being persistent before this command's completion. Also if lower device doesn't support FUA natively, block layer turns it into an empty PREFLUSH command. So, it's hard to say which one will win the benchmark game, maybe we need some performance data before making the choice, but you know, it depends on device's character. > try to improve back? Not clear the benefit on v2. Well, if user suffer and complain performance regression with v1, any plan to improve it? I just thought about plan B/C/D for no matter v1 or v2. Thanks, > >> >> Thanks, >> >>> >>>> >>>> Thanks, >>>> >>>>> >>>>>> >>>>>> And please see do_checkpoint(), we call f2fs_flush_device_cache() and >>>>>> commit_checkpoint() separately to keep persistence order of CP datas. >>>>>> >>>>>> See commit 46706d5917f4 ("f2fs: flush cp pack except cp pack 2 page at first") >>>>>> for details. >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> >>>>>>>> >>>>>>>> If SPO is triggered during flush command, inode page can be persisted >>>>>>>> before data page #0, so that after recovery, inode page can be recovered >>>>>>>> with new physical block address of data page #0, however there may >>>>>>>> contains dummy data in new physical block address. >>>>>>>> >>>>>>>> Then what user will see is: after overwrite & fsync + SPO, old data in >>>>>>>> file was corrupted, if any user do care about such case, we can suggest >>>>>>>> user to use STRICT fsync mode, in this mode, we will force to trigger >>>>>>>> preflush command to persist data in device cache in prior to node >>>>>>>> writeback, it avoids potential data corruption during fsync(). >>>>>>>> >>>>>>>> Signed-off-by: Chao Yu >>>>>>>> --- >>>>>>>> v2: >>>>>>>> - fix this by adding additional preflush command rather than using >>>>>>>> atomic write flow. >>>>>>>> fs/f2fs/file.c | 14 ++++++++++++++ >>>>>>>> 1 file changed, 14 insertions(+) >>>>>>>> >>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>>>>> index 7d5311d54f63..238ca2a733ac 100644 >>>>>>>> --- a/fs/f2fs/file.c >>>>>>>> +++ b/fs/f2fs/file.c >>>>>>>> @@ -301,6 +301,20 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end, >>>>>>>> f2fs_exist_written_data(sbi, ino, UPDATE_INO)) >>>>>>>> goto flush_out; >>>>>>>> goto out; >>>>>>>> + } else { >>>>>>>> + /* >>>>>>>> + * for OPU case, during fsync(), node can be persisted before >>>>>>>> + * data when lower device doesn't support write barrier, result >>>>>>>> + * in data corruption after SPO. >>>>>>>> + * So for strict fsync mode, force to trigger preflush to keep >>>>>>>> + * data/node write order to avoid potential data corruption. >>>>>>>> + */ >>>>>>>> + if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT && >>>>>>>> + !atomic) { >>>>>>>> + ret = f2fs_issue_flush(sbi, inode->i_ino); >>>>>>>> + if (ret) >>>>>>>> + goto out; >>>>>>>> + } >>>>>>>> } >>>>>>>> go_write: >>>>>>>> /* >>>>>>>> -- >>>>>>>> 2.29.2