Received: by 2002:a25:b794:0:0:0:0:0 with SMTP id n20csp3427330ybh; Mon, 5 Aug 2019 18:37:35 -0700 (PDT) X-Google-Smtp-Source: APXvYqxWDnmwW+HjZw20iwqDFexTcdtgPJJKQXzZNKtRp3uPbk9Rd49Zy9ACUb1DB0X+uUCXsl72 X-Received: by 2002:aa7:92cb:: with SMTP id k11mr997502pfa.126.1565055455184; Mon, 05 Aug 2019 18:37:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1565055455; cv=none; d=google.com; s=arc-20160816; b=c2OC4BTlJ3i8LZIh31JQQav5kzKfGGMNwd2c9XBNFTW++QW3girHXZhbiGKhhhP5b2 ii9BEK9JTUDJnCV3r2lJAgg7cGJoZIyKjHkVIhWLYAkFfQSUAS6xb+aD8xVs8HIM7R/A hWJ85z5jAJlX9GzE8qn0l5wkL5DRN7U4ePsAlVa/W25kwlYv4Rle21/Ut6Kk5thjAnVJ Oi5cJ1//Dp2qmuM8tf6JpoYTl+juCB+/SzuY1v0QyEuHPpBA95IJ4X+leV114M75cZkC TA/iq6XZfEGYi4nglLyhLBZpIsmbByYEY7NylF2p19Z0/LGOmFkb+UaQZ8siMAyc0gQJ 1BBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=4KumpxvFtiAOsQboiBcpXZ2cTGWUCPg7iqEIt8B78Mk=; b=w/7e0QSDX0cIuV+LwPxZoBAUnsnJZHwbOXb9TWdJNivZN6a6dqjLu9GTAMSNSTU4mF KXcHEaPisM+IfwPg6tIPh8bH/nrFtwLYLaBc0+D0HXp6Q9CtZ5h/fagERB9zmU1SBOZp /USK9idUSkRslhEyUXwZINJlbfxwbtvFhpJv89cx912nU/t68ii8l85YITo73OF4lOUx PmiSuofsIRjR0/Az1t8vi5CeMDczI9pG4ot8EK+3/e5L3Yj9ioG6WJkWjtpQUbOddVZC lO1N6000snyKM+kIVUh+sheIE2fgpG/oF5zsJdGHlkN33+xfuma2lw4MUswjX92kQ1gI ZVvw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u11si42572565plr.431.2019.08.05.18.37.19; Mon, 05 Aug 2019 18:37:35 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1731277AbfHFBgT (ORCPT + 99 others); Mon, 5 Aug 2019 21:36:19 -0400 Received: from szxga04-in.huawei.com ([45.249.212.190]:4177 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728870AbfHFBgT (ORCPT ); Mon, 5 Aug 2019 21:36:19 -0400 Received: from DGGEMS404-HUB.china.huawei.com (unknown [172.30.72.59]) by Forcepoint Email with ESMTP id EEEE4F0AE15A92C4CDFA; Tue, 6 Aug 2019 09:36:15 +0800 (CST) Received: from [10.134.22.195] (10.134.22.195) by smtp.huawei.com (10.3.19.204) with Microsoft SMTP Server (TLS) id 14.3.439.0; Tue, 6 Aug 2019 09:36:14 +0800 Subject: Re: [PATCH v2] f2fs: separate NOCoW and pinfile semantics To: Jaegeuk Kim CC: , , References: <20190723023640.GC60778@jaegeuk-macbookpro.roam.corp.google.com> <20190729055738.GA95664@jaegeuk-macbookpro.roam.corp.google.com> <07cd3aba-3516-9ba5-286e-277abb98e244@huawei.com> <20190730180231.GB76478@jaegeuk-macbookpro.roam.corp.google.com> <00e70eb1-c4fa-a6c9-69d7-71ff995c7d6c@huawei.com> <20190801041435.GB84433@jaegeuk-macbookpro.roam.corp.google.com> <20190801222746.GA27597@jaegeuk-macbookpro.roam.corp.google.com> <5d566fce-4412-65b2-e9d9-279b648f7551@huawei.com> <20190806003749.GB98101@jaegeuk-macbookpro.roam.corp.google.com> From: Chao Yu Message-ID: Date: Tue, 6 Aug 2019 09:36:36 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20190806003749.GB98101@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset="windows-1252" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.134.22.195] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2019/8/6 8:37, Jaegeuk Kim wrote: > On 08/02, Chao Yu wrote: >> On 2019/8/2 6:27, Jaegeuk Kim wrote: >>> On 08/01, Chao Yu wrote: >>>> On 2019/8/1 12:14, Jaegeuk Kim wrote: >>>>> On 07/31, Chao Yu wrote: >>>>>> On 2019/7/31 2:02, Jaegeuk Kim wrote: >>>>>>> On 07/29, Chao Yu wrote: >>>>>>>> On 2019/7/29 13:57, Jaegeuk Kim wrote: >>>>>>>>> On 07/23, Chao Yu wrote: >>>>>>>>>> On 2019/7/23 10:36, Jaegeuk Kim wrote: >>>>>>>>>>> On 07/19, Chao Yu wrote: >>>>>>>>>>>> Pinning a file is heavy, because skipping pinned files make GC >>>>>>>>>>>> running with heavy load or no effect. >>>>>>>>>>> >>>>>>>>>>> Pinned file is a part of NOCOW files, so I don't think we can simply drop it >>>>>>>>>>> for backward compatibility. >>>>>>>>>> >>>>>>>>>> Yes, >>>>>>>>>> >>>>>>>>>> But what I concerned is that pin file is too heavy, so in order to satisfy below >>>>>>>>>> demand, how about introducing pin_file_2 flag to triggering IPU only during >>>>>>>>>> flush/writeback. >>>>>>>>> >>>>>>>>> That can be done by cold files? >>>>>>>> >>>>>>>> Then it may inherit property of cold type file, e.g. a) goes into cold area; b) >>>>>>>> update with very low frequency. >>>>>>>> >>>>>>>> Actually pin_file_2 could be used by db-wal/log file, which are updated >>>>>>>> frequently, and should go to hot/warm area, it does not match above two property. >>>>>>> >>>>>>> How about considering another name like "IPU-only mode"? >>>>>>> >>>>>>> fallocate write Flag GC >>>>>>> Pin_file: preallocate IPU FS_NOCOW_FL Not allowed >>>>>>> IPU_file: Not preallocate IPU N/A Default by temperature >>>>>> >>>>>> One question, do we need preallocate physical block address for IPU_file as >>>>>> Pin_file? since it can enhance db file's sequential read performance, not sure, >>>>>> db can handle random data in preallocated blocks. >>>>> >>>>> db file will do atomic writes, which can not be used with this. -wal may be able >>>> >>>> Now WAL mode were set by default in Android, so most of db file are -wal type now. >>> >>> Will be back again tho. >> >> R? > > Q. > >> >>> >>>> >>>>> to preallocate blocks, but it can eat disk space unnecessarily. >>>> >>>> I meant .db-wal file rather than .db. >>>> >>>> Yes, that's ext4 style, that would bring better performance due to less holes in >>>> block distribution. >>>> >>>> I don't think we need to worry about space issue for db-wal file. I tracked >>>> .db-wal file's update before: >>>> - there are very frequently truncation and deletion, that means the preallocated >>>> blocks won't exist for long time. >>>> - and also there are very frequently append writes, I suspect there almost very >>>> few preallocate block are not written. >>>> - total db-wal file number is less. >>> >>> Sometimes it can be large enough for system. >> >> For this, it's trade off: >> - lose a few disk space at the very begin of db-wal lifecycle Or >> - face fragment and read performance degradation. >> >>> If it's from user apps and short lived, why do we need preallocation? >> >> It triggers sequential read on db-wal file during checkpoint, though it's short >> lived, still it can affect performance. >> >> What do you think of doing some performance test on WAL file to decide the >> preallocation policy? > > Good idea. Can we? Let me test for numbers later. Thanks, > >> >> Thanks, >> >>> >>>> >>>>> >>>>>> >>>>>> Other behaviors looks good to me. :) >>>>>> >>>>>> I plan to use last bit in inode.i_inline to store this flag. >>>>> >>>>> Why not using i_flag like FS_NOCOW_FL? >>>> >>>> Oops, as you listed in last email, I can see you don't want to break >>>> FS_NOCOW_FL's semantics for backward compatibility. >>>> >>>> Flag >>>> IPU_file N/A >>>> >>>> If we plan to use FS_NOCOW_FL, that's what this patch has already did, you can >>>> merge it directly... :P >>>> >>>>> >>>>>> >>>>>>> Cold_file: Not preallocate IPU N/A Move in cold area >>>>>>> Hot_file: Not preallocate IPU/OPU N/A Move in hot area >>>>>> >>>>>> Should hot file be gced to hot area? That would mix new hot data with old 'hot' >>>>>> data which actually become cold. >>>>> >>>>> But, user explicitly specified this is hot. >>>> >>>> With current implementation, GC will migrate data from hot/warm/cold area to >>>> cold area. >>>> >>>> Thanks, >>>> >>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> >>>>>>>> >>>>>>>> Thank, >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So that this patch propose to separate nocow and pinfile semantics: >>>>>>>>>>>> - NOCoW flag can only be set on regular file. >>>>>>>>>>>> - NOCoW file will only trigger IPU at common writeback/flush. >>>>>>>>>>>> - NOCow file will do OPU during GC. >>>>>>>>>>>> >>>>>>>>>>>> For the demand of 1) avoid fragment of file's physical block and >>>>>>>>>>>> 2) userspace don't care about file's specific physical address, >>>>>>>>>>>> tagging file as NOCoW will be cheaper than pinned one. >>>>>>>>>> >>>>>>>>>> ^^^ >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Signed-off-by: Chao Yu >>>>>>>>>>>> --- >>>>>>>>>>>> v2: >>>>>>>>>>>> - rebase code to fix compile error. >>>>>>>>>>>> fs/f2fs/data.c | 3 ++- >>>>>>>>>>>> fs/f2fs/f2fs.h | 1 + >>>>>>>>>>>> fs/f2fs/file.c | 22 +++++++++++++++++++--- >>>>>>>>>>>> 3 files changed, 22 insertions(+), 4 deletions(-) >>>>>>>>>>>> >>>>>>>>>>>> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c >>>>>>>>>>>> index a2a28bb269bf..15fb8954c363 100644 >>>>>>>>>>>> --- a/fs/f2fs/data.c >>>>>>>>>>>> +++ b/fs/f2fs/data.c >>>>>>>>>>>> @@ -1884,7 +1884,8 @@ static inline bool check_inplace_update_policy(struct inode *inode, >>>>>>>>>>>> >>>>>>>>>>>> bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio) >>>>>>>>>>>> { >>>>>>>>>>>> - if (f2fs_is_pinned_file(inode)) >>>>>>>>>>>> + if (f2fs_is_pinned_file(inode) || >>>>>>>>>>>> + F2FS_I(inode)->i_flags & F2FS_NOCOW_FL) >>>>>>>>>>>> return true; >>>>>>>>>>>> >>>>>>>>>>>> /* if this is cold file, we should overwrite to avoid fragmentation */ >>>>>>>>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >>>>>>>>>>>> index 596ab3e1dd7b..f6c5a3d2e659 100644 >>>>>>>>>>>> --- a/fs/f2fs/f2fs.h >>>>>>>>>>>> +++ b/fs/f2fs/f2fs.h >>>>>>>>>>>> @@ -2374,6 +2374,7 @@ static inline void f2fs_change_bit(unsigned int nr, char *addr) >>>>>>>>>>>> #define F2FS_NOATIME_FL 0x00000080 /* do not update atime */ >>>>>>>>>>>> #define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */ >>>>>>>>>>>> #define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */ >>>>>>>>>>>> +#define F2FS_NOCOW_FL 0x00800000 /* Do not cow file */ >>>>>>>>>>>> #define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */ >>>>>>>>>>>> >>>>>>>>>>>> /* Flags that should be inherited by new inodes from their parent. */ >>>>>>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>>>>>>>>> index 7ca545874060..ae0fec54cac6 100644 >>>>>>>>>>>> --- a/fs/f2fs/file.c >>>>>>>>>>>> +++ b/fs/f2fs/file.c >>>>>>>>>>>> @@ -1692,6 +1692,7 @@ static const struct { >>>>>>>>>>>> { F2FS_NOATIME_FL, FS_NOATIME_FL }, >>>>>>>>>>>> { F2FS_INDEX_FL, FS_INDEX_FL }, >>>>>>>>>>>> { F2FS_DIRSYNC_FL, FS_DIRSYNC_FL }, >>>>>>>>>>>> + { F2FS_NOCOW_FL, FS_NOCOW_FL }, >>>>>>>>>>>> { F2FS_PROJINHERIT_FL, FS_PROJINHERIT_FL }, >>>>>>>>>>>> }; >>>>>>>>>>>> >>>>>>>>>>>> @@ -1715,7 +1716,8 @@ static const struct { >>>>>>>>>>>> FS_NODUMP_FL | \ >>>>>>>>>>>> FS_NOATIME_FL | \ >>>>>>>>>>>> FS_DIRSYNC_FL | \ >>>>>>>>>>>> - FS_PROJINHERIT_FL) >>>>>>>>>>>> + FS_PROJINHERIT_FL | \ >>>>>>>>>>>> + FS_NOCOW_FL) >>>>>>>>>>>> >>>>>>>>>>>> /* Convert f2fs on-disk i_flags to FS_IOC_{GET,SET}FLAGS flags */ >>>>>>>>>>>> static inline u32 f2fs_iflags_to_fsflags(u32 iflags) >>>>>>>>>>>> @@ -1753,8 +1755,6 @@ static int f2fs_ioc_getflags(struct file *filp, unsigned long arg) >>>>>>>>>>>> fsflags |= FS_ENCRYPT_FL; >>>>>>>>>>>> if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode)) >>>>>>>>>>>> fsflags |= FS_INLINE_DATA_FL; >>>>>>>>>>>> - if (is_inode_flag_set(inode, FI_PIN_FILE)) >>>>>>>>>>>> - fsflags |= FS_NOCOW_FL; >>>>>>>>>>>> >>>>>>>>>>>> fsflags &= F2FS_GETTABLE_FS_FL; >>>>>>>>>>>> >>>>>>>>>>>> @@ -1794,6 +1794,22 @@ static int f2fs_ioc_setflags(struct file *filp, unsigned long arg) >>>>>>>>>>>> if (ret) >>>>>>>>>>>> goto out; >>>>>>>>>>>> >>>>>>>>>>>> + if ((fsflags ^ old_fsflags) & FS_NOCOW_FL) { >>>>>>>>>>>> + if (!S_ISREG(inode->i_mode)) { >>>>>>>>>>>> + ret = -EINVAL; >>>>>>>>>>>> + goto out; >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> + if (f2fs_should_update_outplace(inode, NULL)) { >>>>>>>>>>>> + ret = -EINVAL; >>>>>>>>>>>> + goto out; >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> + ret = f2fs_convert_inline_inode(inode); >>>>>>>>>>>> + if (ret) >>>>>>>>>>>> + goto out; >>>>>>>>>>>> + } >>>>>>>>>>>> + >>>>>>>>>>>> ret = f2fs_setflags_common(inode, iflags, >>>>>>>>>>>> f2fs_fsflags_to_iflags(F2FS_SETTABLE_FS_FL)); >>>>>>>>>>>> out: >>>>>>>>>>>> -- >>>>>>>>>>>> 2.18.0.rc1 >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>> . >>>>>>>>> >>>>>>> . >>>>>>> >>>>> . >>>>> >>> . >>> > . >