Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752003AbeADBjn (ORCPT + 1 other); Wed, 3 Jan 2018 20:39:43 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:3671 "EHLO huawei.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751727AbeADBjm (ORCPT ); Wed, 3 Jan 2018 20:39:42 -0500 Message-ID: <5A4D85CB.4020208@huawei.com> Date: Thu, 4 Jan 2018 09:39:23 +0800 From: alex chen User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/20130509 Thunderbird/17.0.6 MIME-Version: 1.0 To: Gang He CC: , , , , Andrew Morton Subject: Re: [Ocfs2-devel] [PATCH v2 2/2] ocfs2: add trimfs lock to avoid duplicated trims in cluster References: <1513228484-2084-1-git-send-email-ghe@suse.com> <1513228484-2084-2-git-send-email-ghe@suse.com> In-Reply-To: <1513228484-2084-2-git-send-email-ghe@suse.com> Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.177.26.59] X-CFilter-Loop: Reflected Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Return-Path: Hi Gang, On 2017/12/14 13:14, Gang He wrote: > As you know, ocfs2 has support trim the underlying disk via > fstrim command. But there is a problem, ocfs2 is a shared disk > cluster file system, if the user configures a scheduled fstrim > job on each file system node, this will trigger multiple nodes > trim a shared disk simultaneously, it is very wasteful for CPU > and IO consumption, also might negatively affect the lifetime > of poor-quality SSD devices. > Then, we introduce a trimfs dlm lock to communicate with each > other in this case, which will make only one fstrim command to > do the trimming on a shared disk among the cluster, the fstrim > commands from the other nodes should wait for the first fstrim > to finish and returned success directly, to avoid running a the > same trim on the shared disk again. > For the same purpose, can we take global bitmap meta lock EXMODE instead of add a new trimfs dlm lock? Thanks, Alex > Compare with first version, I change the fstrim commands' returned > value and behavior in case which meets a fstrim command is running > on a shared disk. > > Signed-off-by: Gang He > --- > fs/ocfs2/alloc.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 44 insertions(+) > > diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c > index ab5105f..5c9c3e2 100644 > --- a/fs/ocfs2/alloc.c > +++ b/fs/ocfs2/alloc.c > @@ -7382,6 +7382,7 @@ int ocfs2_trim_fs(struct super_block *sb, struct fstrim_range *range) > struct buffer_head *gd_bh = NULL; > struct ocfs2_dinode *main_bm; > struct ocfs2_group_desc *gd = NULL; > + struct ocfs2_trim_fs_info info, *pinfo = NULL; > > start = range->start >> osb->s_clustersize_bits; > len = range->len >> osb->s_clustersize_bits; > @@ -7419,6 +7420,42 @@ int ocfs2_trim_fs(struct super_block *sb, struct fstrim_range *range) > > trace_ocfs2_trim_fs(start, len, minlen); > > + ocfs2_trim_fs_lock_res_init(osb); > + ret = ocfs2_trim_fs_lock(osb, NULL, 1); > + if (ret < 0) { > + if (ret != -EAGAIN) { > + mlog_errno(ret); > + ocfs2_trim_fs_lock_res_uninit(osb); > + goto out_unlock; > + } > + > + mlog(ML_NOTICE, "Wait for trim on device (%s) to " > + "finish, which is running from another node.\n", > + osb->dev_str); > + ret = ocfs2_trim_fs_lock(osb, &info, 0); > + if (ret < 0) { > + mlog_errno(ret); > + ocfs2_trim_fs_lock_res_uninit(osb); > + goto out_unlock; > + } > + > + if (info.tf_valid && info.tf_success && > + info.tf_start == start && info.tf_len == len && > + info.tf_minlen == minlen) { > + /* Avoid sending duplicated trim to a shared device */ > + mlog(ML_NOTICE, "The same trim on device (%s) was " > + "just done from node (%u), return.\n", > + osb->dev_str, info.tf_nodenum); > + range->len = info.tf_trimlen; > + goto out_trimunlock; > + } > + } > + > + info.tf_nodenum = osb->node_num; > + info.tf_start = start; > + info.tf_len = len; > + info.tf_minlen = minlen; > + > /* Determine first and last group to examine based on start and len */ > first_group = ocfs2_which_cluster_group(main_bm_inode, start); > if (first_group == osb->first_cluster_group_blkno) > @@ -7463,6 +7500,13 @@ int ocfs2_trim_fs(struct super_block *sb, struct fstrim_range *range) > group += ocfs2_clusters_to_blocks(sb, osb->bitmap_cpg); > } > range->len = trimmed * sb->s_blocksize; > + > + info.tf_trimlen = range->len; > + info.tf_success = (ret ? 0 : 1); > + pinfo = &info; > +out_trimunlock: > + ocfs2_trim_fs_unlock(osb, pinfo); > + ocfs2_trim_fs_lock_res_uninit(osb); > out_unlock: > ocfs2_inode_unlock(main_bm_inode, 0); > brelse(main_bm_bh); >