Received: by 2002:a25:31c3:0:0:0:0:0 with SMTP id x186csp106753ybx; Thu, 31 Oct 2019 16:48:32 -0700 (PDT) X-Google-Smtp-Source: APXvYqwXhaEOisABjbPcPhyhKcROZBB2PvcuWhD167Wa1eBaYmPExX+0TwVpxEH2imR06kOcJfsk X-Received: by 2002:a17:906:14cc:: with SMTP id y12mr6923920ejc.279.1572565712849; Thu, 31 Oct 2019 16:48:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1572565712; cv=none; d=google.com; s=arc-20160816; b=WFMsqzYYJwPP+aFwzZ00Z1CSpSEPBF4YoVvgJVKzl3R0qEkW8XuZcX7SyrpVVV49m6 GnkeK1mmLNJ7jAdRLCudvMU++MHv3quUj2mbxV7EQ8UCTbCapWnzC++uGk1rlk1GqEoL w2ymTOzNoXe98oko/iWNvYTE8VKHMZQr9VXdvx5+Pean/RkM9/jQFqw2mnZ0ERpBgCkC LWNKhnOPhALbOKK8GiwfkYN73ZqLqjldWAu4XEiuQ4mXhN3ckjYW1tg3PxLHBCYgMkFA JAyI9TOKt5Imzpgi3qv/8t9S4rpOeRmO4NioMJskEECX3+/2hTt772+t7b9qE9N+HvJR PQ8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=EC9tOHfeetg4FhWcrr/WI9hNJjny5YmMMuhBm1eaQo8=; b=atJduSwecWKfm1nrH4r4D0uQWYZwW3vA6lDX2Ti6y7xdMJ9oJ/Ah+IE1iBorKsnzoD PiO14jxBsVVFrprisdez483v9VtyHi+nGywDRoPf1KhXYewHjiacaR4NS4A2lLGCDDiI SSH6svJmG5gvg4UpvfMGrY4d1LkrLPyobJeDmElD1+O2K4CYzhRMsA3ToBPs0x5DKCO5 e4kNmItBoQyFDUMPFsS5+NUQ46bXWXD5VNHCP5qpISPK/HysDHArdYpkrTq3xudzllTI MeoD1rHJ8qsBPFjkaBKpq/Gv6sI6U5r8+JuA3o7oPHoe23C34yY9F/HugfrYQwVg3dAg e1Sg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c29si5317896ede.50.2019.10.31.16.48.07; Thu, 31 Oct 2019 16:48:32 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729331AbfJaXrJ (ORCPT + 99 others); Thu, 31 Oct 2019 19:47:09 -0400 Received: from mail105.syd.optusnet.com.au ([211.29.132.249]:55491 "EHLO mail105.syd.optusnet.com.au" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728774AbfJaXqe (ORCPT ); Thu, 31 Oct 2019 19:46:34 -0400 Received: from dread.disaster.area (pa49-180-67-183.pa.nsw.optusnet.com.au [49.180.67.183]) by mail105.syd.optusnet.com.au (Postfix) with ESMTPS id 283A53A2815; Fri, 1 Nov 2019 10:46:26 +1100 (AEDT) Received: from discord.disaster.area ([192.168.253.110]) by dread.disaster.area with esmtp (Exim 4.92.3) (envelope-from ) id 1iQK8x-0007CW-Kq; Fri, 01 Nov 2019 10:46:19 +1100 Received: from dave by discord.disaster.area with local (Exim 4.92.3) (envelope-from ) id 1iQK8x-000424-Iy; Fri, 01 Nov 2019 10:46:19 +1100 From: Dave Chinner To: linux-xfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 19/28] xfs: reduce kswapd blocking on inode locking. Date: Fri, 1 Nov 2019 10:46:09 +1100 Message-Id: <20191031234618.15403-20-david@fromorbit.com> X-Mailer: git-send-email 2.24.0.rc0 In-Reply-To: <20191031234618.15403-1-david@fromorbit.com> References: <20191031234618.15403-1-david@fromorbit.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Optus-CM-Score: 0 X-Optus-CM-Analysis: v=2.2 cv=D+Q3ErZj c=1 sm=1 tr=0 a=3wLbm4YUAFX2xaPZIabsgw==:117 a=3wLbm4YUAFX2xaPZIabsgw==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=MeAgGD-zjQ4A:10 a=20KFwNOVAAAA:8 a=KE6An8oM74Ymw0apzXAA:9 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Chinner When doing async node reclaiming, we grab a batch of inodes that we are likely able to reclaim and ignore those that are already flushing. However, when we actually go to reclaim them, the first thing we do is lock the inode. If we are racing with something else reclaiming the inode or flushing it because it is dirty, we block on the inode lock. Hence we can still block kswapd here. Further, if we flush an inode, we also cluster all the other dirty inodes in that cluster into the same IO, flush locking them all. However, if the workload is operating on sequential inodes (e.g. created by a tarball extraction) most of these inodes will be sequntial in the cache and so in the same batch we've already grabbed for reclaim scanning. As a result, it is common for all the inodes in the batch to be dirty and it is common for the first inode flushed to also flush all the inodes in the reclaim batch. In which case, they are now all going to be flush locked and we do not want to block on them. Hence, for async reclaim (SYNC_TRYLOCK) make sure we always use trylock semantics and abort reclaim of an inode as quickly as we can without blocking kswapd. This will be necessary for the upcoming conversion to LRU lists for inode reclaim tracking. Found via tracing and finding big batches of repeated lock/unlock runs on inodes that we just flushed by write clustering during reclaim. Signed-off-by: Dave Chinner Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_icache.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index edcc3f6bb3bf..189cf423fe8f 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1104,11 +1104,23 @@ xfs_reclaim_inode( restart: error = 0; - xfs_ilock(ip, XFS_ILOCK_EXCL); - if (!xfs_iflock_nowait(ip)) { - if (!(sync_mode & SYNC_WAIT)) + /* + * Don't try to flush the inode if another inode in this cluster has + * already flushed it after we did the initial checks in + * xfs_reclaim_inode_grab(). + */ + if (sync_mode & SYNC_TRYLOCK) { + if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) goto out; - xfs_iflock(ip); + if (!xfs_iflock_nowait(ip)) + goto out_unlock; + } else { + xfs_ilock(ip, XFS_ILOCK_EXCL); + if (!xfs_iflock_nowait(ip)) { + if (!(sync_mode & SYNC_WAIT)) + goto out_unlock; + xfs_iflock(ip); + } } if (XFS_FORCED_SHUTDOWN(ip->i_mount)) { @@ -1215,9 +1227,10 @@ xfs_reclaim_inode( out_ifunlock: xfs_ifunlock(ip); +out_unlock: + xfs_iunlock(ip, XFS_ILOCK_EXCL); out: xfs_iflags_clear(ip, XFS_IRECLAIM); - xfs_iunlock(ip, XFS_ILOCK_EXCL); /* * We could return -EAGAIN here to make reclaim rescan the inode tree in * a short while. However, this just burns CPU time scanning the tree -- 2.24.0.rc0