Received: by 10.192.165.148 with SMTP id m20csp3519047imm; Mon, 23 Apr 2018 07:55:50 -0700 (PDT) X-Google-Smtp-Source: AIpwx49fd9Mi0Ja8lene136cBbV5Sop2TB+qNTapQOh9urJaaH+UGw1sZNKrDPNc/M1xpWWtK8iF X-Received: by 10.98.54.134 with SMTP id d128mr20267833pfa.39.1524495350308; Mon, 23 Apr 2018 07:55:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1524495350; cv=none; d=google.com; s=arc-20160816; b=mjvnI4kgfwV2abL8WikKC08C1iQlY8fOTVa7ZTjsadxyUrA8qE6tpiCh/vfxkR8aWA ldK38fUUf5WE7VJuOG1Ez7jLvnzUaS0jWqE7zXRiRw7oE7Ml5OYM6Hr93vSKxkxhzq0b FpJJuv27GdzdrA89TCfmOL1I/shVV0BkdpBxAdEWE7CwPklhW0C6+S6yUix5ahzCETFR WzvCy6G64eMCeTnmi29WlBuGF5HHEgiejBWmfdULUERkc8drCoSwTQna0UtojYJCO/z8 rNLig0NseYyQfBXdmbGBQiXk2HGgeNcAKNoMSt7MT7Mxu3yR4/BUhcytHTy96WJ41lG8 XlhA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dmarc-filter :arc-authentication-results; bh=YpzkfuV7Dml5Ju4C2n0Z45/Jxeork3Ft65Ikzeer1AQ=; b=KxiuR/++QqVIZnXFJ/Jz5KQw5VWzJdmUiO7FGuhVlIsP/BeLqopMVkePIDhn/aywyY bB4Cg9gEcG5a4OgK6CuMI2KGgQb6XG2Lq2BtFTfvNaS1jWMtnBFHFfB5Bd5tCCSIk9G2 07DZIGtQy60EkztbKnCXh45zs7wrbpMGC0aaRTxIMdY1anv+VJ/mxH/MVYcI1Vm4zLmL qpcnw8vtn1jz5cjIdAwtm0DRKAVEqG1tyqlCGZxwp9Ze5qXukE+hhNAf46QcZoo4Mncm XkBvgUv8gqBBY5gJ+ccfRocxyOXaE1HC/HCCdaU5XerI+vu9BdDBO2Zb5/CGBR4VX6cY IAQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o12-v6si8585449plg.463.2018.04.23.07.55.36; Mon, 23 Apr 2018 07:55:50 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755828AbeDWOxo (ORCPT + 99 others); Mon, 23 Apr 2018 10:53:44 -0400 Received: from mail.kernel.org ([198.145.29.99]:46250 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755723AbeDWOxn (ORCPT ); Mon, 23 Apr 2018 10:53:43 -0400 Received: from [192.168.0.101] (unknown [49.77.227.253]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 24D0F21795; Mon, 23 Apr 2018 14:53:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 24D0F21795 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=chao@kernel.org Subject: Re: [PATCH] f2fs: sepearte hot/cold in free nid To: Jaegeuk Kim , Chao Yu Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org References: <20180420015231.90679-1-yuchao0@huawei.com> <20180420033752.GH39280@jaegeuk-macbookpro.roam.corp.google.com> <4dd1cddd-02da-6ecb-de91-a21feea6f99c@huawei.com> <20180423144905.GB31757@jaegeuk-macbookpro.roam.corp.google.com> From: Chao Yu Message-ID: <5f189a17-992d-9911-abdd-f4a493cbffc7@kernel.org> Date: Mon, 23 Apr 2018 22:53:35 +0800 User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180423144905.GB31757@jaegeuk-macbookpro.roam.corp.google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018/4/23 22:49, Jaegeuk Kim wrote: > On 04/20, Chao Yu wrote: >> On 2018/4/20 12:04, Chao Yu wrote: >>> On 2018/4/20 11:37, Jaegeuk Kim wrote: >>>> On 04/20, Chao Yu wrote: >>>>> As most indirect node, dindirect node, and xattr node won't be updated >>>>> after they are created, but inode node and other direct node will change >>>>> more frequently, so store their nat entries mixedly in whole nat table >>>>> will suffer: >>>>> - fragment nat table soon due to different update rate >>>>> - more nat block update due to fragmented nat table >>>>> >>>>> In order to solve above issue, we're trying to separate whole nat table to >>>>> two part: >>>>> a. Hot free nid area: >>>>> - range: [nid #0, nid #x) >>>>> - store node block address for >>>>> * inode node >>>>> * other direct node >>>>> b. Cold free nid area: >>>>> - range: [nid #x, max nid) >>>>> - store node block address for >>>>> * indirect node >>>>> * dindirect node >>>>> * xattr node >>>>> >>>>> Allocation strategy example: >>>>> >>>>> Free nid: '-' >>>>> Used nid: '=' >>>>> >>>>> 1. Initial status: >>>>> Free Nids: |-----------------------------------------------------------------------| >>>>> ^ ^ ^ ^ >>>>> Alloc Range: |---------------| |---------------| >>>>> hot_start hot_end cold_start cold_end >>>>> >>>>> 2. Free nids have ran out: >>>>> Free Nids: |===============-----------------------------------------===============| >>>>> ^ ^ ^ ^ >>>>> Alloc Range: |===============| |===============| >>>>> hot_start hot_end cold_start cold_end >>>>> >>>>> 3. Expand hot/cold area range: >>>>> Free Nids: |===============-----------------------------------------===============| >>>>> ^ ^ ^ ^ >>>>> Alloc Range: |===============----------------| |----------------===============| >>>>> hot_start hot_end cold_start cold_end >>>>> >>>>> 4. Hot free nids have ran out: >>>>> Free Nids: |===============================-------------------------===============| >>>>> ^ ^ ^ ^ >>>>> Alloc Range: |===============================| |----------------===============| >>>>> hot_start hot_end cold_start cold_end >>>>> >>>>> 5. Expand hot area range, hot/cold area boundary has been fixed: >>>>> Free Nids: |===============================-------------------------===============| >>>>> ^ ^ ^ >>>>> Alloc Range: |===============================--------|----------------===============| >>>>> hot_start hot_end(cold_start) cold_end >>>>> >>>>> Run xfstests with generic/*: >>>>> >>>>> before >>>>> node_write: 169660 >>>>> cp_count: 60118 >>>>> node/cp 2.82 >>>>> >>>>> after: >>>>> node_write: 159145 >>>>> cp_count: 84501 >>>>> node/cp: 2.64 >>>> >>>> Nice trial tho, I don't see much benefit on this huge patch. I guess we may be >>>> able to find an efficient way to achieve this issue rather than changing whole >>>> stable codes. >>> >>> IMO, based on this, later, we can add more allocation policy to manage free nid >>> resource to get more benefit. >>> >>> If you worry about code stability, we can queue this patch in dev-test branch to >>> test this longer time. >>> >>>> >>>> How about getting a free nid in the list from head or tail separately? >>> >>> I don't think this can get benefit from long time used image, since nat table >>> will be fragmented anyway, then we won't know free nid in head or in tail comes >>> from hot nat block or cold nat block. >>> >>> Anyway, I will have a try. >> >> A quick test result with below patch: >> >> node_write:187837, cp_count:76431 > > Can we gather some real numbers from android workloads? No problem. :) Thanks, > > Thanks, > >> >> >From 9f88ea8a36a74f1d3ed8df57ceffb1b8ae41a161 Mon Sep 17 00:00:00 2001 >> From: Chao Yu >> Date: Fri, 20 Apr 2018 16:18:26 +0800 >> Subject: [PATCH] f2fs: separate hot/cold free nid simply >> >> Signed-off-by: Chao Yu >> --- >> fs/f2fs/f2fs.h | 2 +- >> fs/f2fs/namei.c | 2 +- >> fs/f2fs/node.c | 14 +++++++++----- >> fs/f2fs/xattr.c | 2 +- >> 4 files changed, 12 insertions(+), 8 deletions(-) >> >> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h >> index 7651a118faa3..adca5e8bc19a 100644 >> --- a/fs/f2fs/f2fs.h >> +++ b/fs/f2fs/f2fs.h >> @@ -2800,7 +2800,7 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct >> inode *inode, >> int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc, >> bool do_balance, enum iostat_type io_type); >> void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount); >> -bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid); >> +bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid, bool hot); >> void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid); >> void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid); >> int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink); >> diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c >> index 3d59149590f7..bbf6edbb7298 100644 >> --- a/fs/f2fs/namei.c >> +++ b/fs/f2fs/namei.c >> @@ -37,7 +37,7 @@ static struct inode *f2fs_new_inode(struct inode *dir, umode_t >> mode) >> return ERR_PTR(-ENOMEM); >> >> f2fs_lock_op(sbi); >> - if (!alloc_nid(sbi, &ino)) { >> + if (!alloc_nid(sbi, &ino, true)) { >> f2fs_unlock_op(sbi); >> err = -ENOSPC; >> goto fail; >> diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c >> index ea231f8a0cce..bbaae2f3461e 100644 >> --- a/fs/f2fs/node.c >> +++ b/fs/f2fs/node.c >> @@ -631,7 +631,7 @@ int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t >> index, int mode) >> >> if (!nids[i] && mode == ALLOC_NODE) { >> /* alloc new node */ >> - if (!alloc_nid(sbi, &(nids[i]))) { >> + if (!alloc_nid(sbi, &(nids[i]), i == 1)) { >> err = -ENOSPC; >> goto release_pages; >> } >> @@ -2108,7 +2108,7 @@ void build_free_nids(struct f2fs_sb_info *sbi, bool sync, >> bool mount) >> * from second parameter of this function. >> * The returned nid could be used ino as well as nid when inode is created. >> */ >> -bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid) >> +bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid, bool hot) >> { >> struct f2fs_nm_info *nm_i = NM_I(sbi); >> struct free_nid *i = NULL; >> @@ -2129,8 +2129,12 @@ bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid) >> /* We should not use stale free nids created by build_free_nids */ >> if (nm_i->nid_cnt[FREE_NID] && !on_build_free_nids(nm_i)) { >> f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list)); >> - i = list_first_entry(&nm_i->free_nid_list, >> - struct free_nid, list); >> + if (hot) >> + i = list_first_entry(&nm_i->free_nid_list, >> + struct free_nid, list); >> + else >> + i = list_last_entry(&nm_i->free_nid_list, >> + struct free_nid, list); >> *nid = i->nid; >> >> __move_free_nid(sbi, i, FREE_NID, PREALLOC_NID); >> @@ -2275,7 +2279,7 @@ int recover_xattr_data(struct inode *inode, struct page *page) >> >> recover_xnid: >> /* 2: update xattr nid in inode */ >> - if (!alloc_nid(sbi, &new_xnid)) >> + if (!alloc_nid(sbi, &new_xnid, false)) >> return -ENOSPC; >> >> set_new_dnode(&dn, inode, NULL, NULL, new_xnid); >> diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c >> index d847b2b11659..46c090614563 100644 >> --- a/fs/f2fs/xattr.c >> +++ b/fs/f2fs/xattr.c >> @@ -398,7 +398,7 @@ static inline int write_all_xattrs(struct inode *inode, >> __u32 hsize, >> int err = 0; >> >> if (hsize > inline_size && !F2FS_I(inode)->i_xattr_nid) >> - if (!alloc_nid(sbi, &new_nid)) >> + if (!alloc_nid(sbi, &new_nid, false)) >> return -ENOSPC; >> >> /* write to inline xattr */ >> -- >> 2.15.0.55.gc2ece9dc4de6 >>