Received: by 2002:a25:ad19:0:0:0:0:0 with SMTP id y25csp683623ybi; Fri, 2 Aug 2019 02:33:50 -0700 (PDT) X-Google-Smtp-Source: APXvYqxvll2zbSuzzIYv2PIdq9KuoKKfQcT0OGLwBDJYIPmSqHnmW7kCrIVMSCrDbdFZPHealeCg X-Received: by 2002:a17:902:361:: with SMTP id 88mr132196173pld.123.1564738429924; Fri, 02 Aug 2019 02:33:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1564738429; cv=none; d=google.com; s=arc-20160816; b=SArgvKRhl0XDVHWsgPTRJZ72z7gRxUIjNB4CxviRWcX66Grn7IWsyjzc8U4/0kIXSw nRvgNfpcwesAu7Q1+O1dHNvnPvZedTos34T9Mw5x7LVzUR+06o6x1JK7KvWqxwKBbLbe zuWDM1SqhIuIrTTqL5b4s2DVPhPNangoiFKdIKLYuSeLZ9CBKqhIvnqiA++LK40i2Igm NjnsHZnYlFz36GqDTVML+AdbLlDa/+x6tRFnkb+t9rb1L8kL8nCt1TQMEtzSRxSzn4Dq 9fcfo4618+lLmBya2ZFrPaJtqfscOutPAIiPdU9oIS3qIW8l2+qsrnlFZs/ZMICIuo4p jlNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=ij/jWXltaRefN+QceSbBuUoNIzjyctj+hJealEj3T40=; b=TR5DplgtdATgRAkJ25uVM8u/fLJn/uXn2ijKlz80nfPalHRTiDZ11oo0dFPMF+DwuM oKzGoBE1qqPoZHdkulz0yRr94beXdYSFN0snIKsAWHH0mSmjJEWALT27w4vorYT42arM t9mnSxTBtSrD+A7t9W/qWqJxw9F6MP+E1Q85SQxtHD51nHo71ceZOJOiT4QNSxmHIXDQ j3hjxHI0+d7b3MjGvySrs4U1IzXCKM8fVGfFU9Pz/Wh1xguZ3Mly33OWr3vFFrTv0gjV TlERdYrLwDlxkt9529iaymNz4WgwFrUcnJvbPwjuJHmQtWzJlaITsIN0wVB75sreWeND ibeA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v41si38615693pgn.481.2019.08.02.02.33.33; Fri, 02 Aug 2019 02:33:49 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2390514AbfHBHz2 (ORCPT + 99 others); Fri, 2 Aug 2019 03:55:28 -0400 Received: from szxga06-in.huawei.com ([45.249.212.32]:50792 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728272AbfHBHz1 (ORCPT ); Fri, 2 Aug 2019 03:55:27 -0400 Received: from DGGEMS409-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 74A7C7D4E2B0615EF44C; Fri, 2 Aug 2019 15:55:25 +0800 (CST) Received: from [10.134.22.195] (10.134.22.195) by smtp.huawei.com (10.3.19.209) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 2 Aug 2019 15:55:19 +0800 Subject: Re: [PATCH v2] f2fs: separate NOCoW and pinfile semantics To: Jaegeuk Kim CC: , , References: <20190719073903.9138-1-yuchao0@huawei.com> <20190723023640.GC60778@jaegeuk-macbookpro.roam.corp.google.com> <20190729055738.GA95664@jaegeuk-macbookpro.roam.corp.google.com> <07cd3aba-3516-9ba5-286e-277abb98e244@huawei.com> <20190730180231.GB76478@jaegeuk-macbookpro.roam.corp.google.com> <00e70eb1-c4fa-a6c9-69d7-71ff995c7d6c@huawei.com> <20190801041435.GB84433@jaegeuk-macbookpro.roam.corp.google.com> <20190801222746.GA27597@jaegeuk-macbookpro.roam.corp.google.com> From: Chao Yu Message-ID: <5d566fce-4412-65b2-e9d9-279b648f7551@huawei.com> Date: Fri, 2 Aug 2019 15:55:18 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190801222746.GA27597@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.134.22.195] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/8/2 6:27, Jaegeuk Kim wrote: > On 08/01, Chao Yu wrote: >> On 2019/8/1 12:14, Jaegeuk Kim wrote: >>> On 07/31, Chao Yu wrote: >>>> On 2019/7/31 2:02, Jaegeuk Kim wrote: >>>>> On 07/29, Chao Yu wrote: >>>>>> On 2019/7/29 13:57, Jaegeuk Kim wrote: >>>>>>> On 07/23, Chao Yu wrote: >>>>>>>> On 2019/7/23 10:36, Jaegeuk Kim wrote: >>>>>>>>> On 07/19, Chao Yu wrote: >>>>>>>>>> Pinning a file is heavy, because skipping pinned files make GC >>>>>>>>>> running with heavy load or no effect. >>>>>>>>> >>>>>>>>> Pinned file is a part of NOCOW files, so I don't think we can simply drop it >>>>>>>>> for backward compatibility. >>>>>>>> >>>>>>>> Yes, >>>>>>>> >>>>>>>> But what I concerned is that pin file is too heavy, so in order to satisfy below >>>>>>>> demand, how about introducing pin_file_2 flag to triggering IPU only during >>>>>>>> flush/writeback. >>>>>>> >>>>>>> That can be done by cold files? >>>>>> >>>>>> Then it may inherit property of cold type file, e.g. a) goes into cold area; b) >>>>>> update with very low frequency. >>>>>> >>>>>> Actually pin_file_2 could be used by db-wal/log file, which are updated >>>>>> frequently, and should go to hot/warm area, it does not match above two property. >>>>> >>>>> How about considering another name like "IPU-only mode"? >>>>> >>>>> fallocate write Flag GC >>>>> Pin_file: preallocate IPU FS_NOCOW_FL Not allowed >>>>> IPU_file: Not preallocate IPU N/A Default by temperature >>>> >>>> One question, do we need preallocate physical block address for IPU_file as >>>> Pin_file? since it can enhance db file's sequential read performance, not sure, >>>> db can handle random data in preallocated blocks. >>> >>> db file will do atomic writes, which can not be used with this. -wal may be able >> >> Now WAL mode were set by default in Android, so most of db file are -wal type now. > > Will be back again tho. R? > >> >>> to preallocate blocks, but it can eat disk space unnecessarily. >> >> I meant .db-wal file rather than .db. >> >> Yes, that's ext4 style, that would bring better performance due to less holes in >> block distribution. >> >> I don't think we need to worry about space issue for db-wal file. I tracked >> .db-wal file's update before: >> - there are very frequently truncation and deletion, that means the preallocated >> blocks won't exist for long time. >> - and also there are very frequently append writes, I suspect there almost very >> few preallocate block are not written. >> - total db-wal file number is less. > > Sometimes it can be large enough for system. For this, it's trade off: - lose a few disk space at the very begin of db-wal lifecycle Or - face fragment and read performance degradation. > If it's from user apps and short lived, why do we need preallocation? It triggers sequential read on db-wal file during checkpoint, though it's short lived, still it can affect performance. What do you think of doing some performance test on WAL file to decide the preallocation policy? Thanks, > >> >>> >>>> >>>> Other behaviors looks good to me. :) >>>> >>>> I plan to use last bit in inode.i_inline to store this flag. >>> >>> Why not using i_flag like FS_NOCOW_FL? >> >> Oops, as you listed in last email, I can see you don't want to break >> FS_NOCOW_FL's semantics for backward compatibility. >> >> Flag >> IPU_file N/A >> >> If we plan to use FS_NOCOW_FL, that's what this patch has already did, you can >> merge it directly... :P >> >>> >>>> >>>>> Cold_file: Not preallocate IPU N/A Move in cold area >>>>> Hot_file: Not preallocate IPU/OPU N/A Move in hot area >>>> >>>> Should hot file be gced to hot area? That would mix new hot data with old 'hot' >>>> data which actually become cold. >>> >>> But, user explicitly specified this is hot. >> >> With current implementation, GC will migrate data from hot/warm/cold area to >> cold area. >> >> Thanks, >> >>> >>>> >>>> Thanks, >>>> >>>>> >>>>>> >>>>>> Thank, >>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> So that this patch propose to separate nocow and pinfile semantics: >>>>>>>>>> - NOCoW flag can only be set on regular file. >>>>>>>>>> - NOCoW file will only trigger IPU at common writeback/flush. >>>>>>>>>> - NOCow file will do OPU during GC. >>>>>>>>>> >>>>>>>>>> For the demand of 1) avoid fragment of file's physical block and >>>>>>>>>> 2) userspace don't care about file's specific physical address, >>>>>>>>>> tagging file as NOCoW will be cheaper than pinned one. >>>>>>>> >>>>>>>> ^^^ >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>>>> >>>>>>>>>> Signed-off-by: Chao Yu >>>>>>>>>> --- >>>>>>>>>> v2: >>>>>>>>>> - rebase code to fix compile error. >>>>>>>>>> fs/f2fs/data.c | 3 ++- >>>>>>>>>> fs/f2fs/f2fs.h | 1 + >>>>>>>>>> fs/f2fs/file.c | 22 +++++++++++++++++++--- >>>>>>>>>> 3 files changed, 22 insertions(+), 4 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>>>>>>> index a2a28bb269bf..15fb8954c363 100644 >>>>>>>>>> --- a/fs/f2fs/data.c >>>>>>>>>> +++ b/fs/f2fs/data.c >>>>>>>>>> @@ -1884,7 +1884,8 @@ static inline bool check_inplace_update_policy(struct inode *inode, >>>>>>>>>> >>>>>>>>>> bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio) >>>>>>>>>> { >>>>>>>>>> - if (f2fs_is_pinned_file(inode)) >>>>>>>>>> + if (f2fs_is_pinned_file(inode) || >>>>>>>>>> + F2FS_I(inode)->i_flags & F2FS_NOCOW_FL) >>>>>>>>>> return true; >>>>>>>>>> >>>>>>>>>> /* if this is cold file, we should overwrite to avoid fragmentation */ >>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>>>>>>>> index 596ab3e1dd7b..f6c5a3d2e659 100644 >>>>>>>>>> --- a/fs/f2fs/f2fs.h >>>>>>>>>> +++ b/fs/f2fs/f2fs.h >>>>>>>>>> @@ -2374,6 +2374,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) >>>>>>>>>> #define F2FS_NOATIME_FL 0x00000080 /* do not update atime */ >>>>>>>>>> #define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */ >>>>>>>>>> #define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */ >>>>>>>>>> +#define F2FS_NOCOW_FL 0x00800000 /* Do not cow file */ >>>>>>>>>> #define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ >>>>>>>>>> >>>>>>>>>> /* Flags that should be inherited by new inodes from their parent. */ >>>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>>>>>>> index 7ca545874060..ae0fec54cac6 100644 >>>>>>>>>> --- a/fs/f2fs/file.c >>>>>>>>>> +++ b/fs/f2fs/file.c >>>>>>>>>> @@ -1692,6 +1692,7 @@ static const struct { >>>>>>>>>> { F2FS_NOATIME_FL, FS_NOATIME_FL }, >>>>>>>>>> { F2FS_INDEX_FL, FS_INDEX_FL }, >>>>>>>>>> { F2FS_DIRSYNC_FL, FS_DIRSYNC_FL }, >>>>>>>>>> + { F2FS_NOCOW_FL, FS_NOCOW_FL }, >>>>>>>>>> { F2FS_PROJINHERIT_FL, FS_PROJINHERIT_FL }, >>>>>>>>>> }; >>>>>>>>>> >>>>>>>>>> @@ -1715,7 +1716,8 @@ static const struct { >>>>>>>>>> FS_NODUMP_FL | \ >>>>>>>>>> FS_NOATIME_FL | \ >>>>>>>>>> FS_DIRSYNC_FL | \ >>>>>>>>>> - FS_PROJINHERIT_FL) >>>>>>>>>> + FS_PROJINHERIT_FL | \ >>>>>>>>>> + FS_NOCOW_FL) >>>>>>>>>> >>>>>>>>>> /* Convert f2fs on-disk i_flags to FS_IOC_{GET,SET}FLAGS flags */ >>>>>>>>>> static inline u32 f2fs_iflags_to_fsflags(u32 iflags) >>>>>>>>>> @@ -1753,8 +1755,6 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg) >>>>>>>>>> fsflags |= FS_ENCRYPT_FL; >>>>>>>>>> if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) >>>>>>>>>> fsflags |= FS_INLINE_DATA_FL; >>>>>>>>>> - if (is_inode_flag_set(inode, FI_PIN_FILE)) >>>>>>>>>> - fsflags |= FS_NOCOW_FL; >>>>>>>>>> >>>>>>>>>> fsflags &= F2FS_GETTABLE_FS_FL; >>>>>>>>>> >>>>>>>>>> @@ -1794,6 +1794,22 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg) >>>>>>>>>> if (ret) >>>>>>>>>> goto out; >>>>>>>>>> >>>>>>>>>> + if ((fsflags ^ old_fsflags) & FS_NOCOW_FL) { >>>>>>>>>> + if (!S_ISREG(inode->i_mode)) { >>>>>>>>>> + ret = -EINVAL; >>>>>>>>>> + goto out; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + if (f2fs_should_update_outplace(inode, NULL)) { >>>>>>>>>> + ret = -EINVAL; >>>>>>>>>> + goto out; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> + ret = f2fs_convert_inline_inode(inode); >>>>>>>>>> + if (ret) >>>>>>>>>> + goto out; >>>>>>>>>> + } >>>>>>>>>> + >>>>>>>>>> ret = f2fs_setflags_common(inode, iflags, >>>>>>>>>> f2fs_fsflags_to_iflags(F2FS_SETTABLE_FS_FL)); >>>>>>>>>> out: >>>>>>>>>> -- >>>>>>>>>> 2.18.0.rc1 >>>>>>>>> . >>>>>>>>> >>>>>>> . >>>>>>> >>>>> . >>>>> >>> . >>> > . >