Received: by 2002:a05:6a11:4021:0:0:0:0 with SMTP id ky33csp2354459pxb; Mon, 20 Sep 2021 19:53:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxjJVXR0E4TCaxy0ibfhY/mTQgPb5Nr/foAPPnGvBWHXNS/G6OjZb5lu8+H0gPuEgVpCieF X-Received: by 2002:a17:907:3f18:: with SMTP id hq24mr32138820ejc.384.1632192839326; Mon, 20 Sep 2021 19:53:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1632192839; cv=none; d=google.com; s=arc-20160816; b=BCLkHSyFWL1ydsh8mwWiMy0tQMwUM0CW3iwSrPwsqq08JjEmuFpoQh+TWW2i24/lV1 Tmbbd6LTcfTXKelERhCBi4opEw/1G7qgER1BVZg94yVkg0kJxkXnNaNY9LMF3rL3SIiu /+4fOWPY8zeUdFpwKsvSAvW9zy8VOWhKGaImctM+6PBrpgWbYGzJEsl5PqWSx82+YbIw /WOEl84UzmMntBn8FtzvM4bzoC8i+ZfJsIFUT04rixtErytFXP9a7+W6zJhlQ0Ia+t0k v130PfAjpi/9aUHTdmn8PJfCz7xxBdW24h7U4fs2RfnK30PaM37+6vCFSN69ZgEpKFt9 1T+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:message-id:date:cc:to:from:subject; bh=88sroAURFJ7DnyOlXzLtBGbmapr7aPmwCeYz1qwPaYU=; b=g4V6/gYV6lwDXOdoEvY3TqLWeM5OxKfLYPXz7NKlwg4nNg/nz1R7nvDyOqSU+ZS+H0 lP0rI0WjBR8ng41Y5N5Fmra02V+7lop7u0T8tYneXVF4MGtUHxv8uZzHn1RjeJiNM43u YiIsshPhpsSyMlPaSCHnHzdYlqEixicDIIhO3lpZ2hwZfjBsC8NBei3yDCipFXMf+UPE /vCGieLZgrYmDW/LX9wM9wW93NfLqUOqVm9GwFg2n73ZTzkcKONCRdZdjRT5iFmrF7PZ A1sNe1fdLM5hw3jjag10HkC81IWdbsNjps7Liw463n2ZyQVpPwOzb3UroX0Fo/IiE2sk 4fsQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id l17si19383849edv.315.2021.09.20.19.53.26; Mon, 20 Sep 2021 19:53:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S244366AbhITT2u (ORCPT + 99 others); Mon, 20 Sep 2021 15:28:50 -0400 Received: from mail.kernel.org ([198.145.29.99]:42246 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1346505AbhITT0u (ORCPT ); Mon, 20 Sep 2021 15:26:50 -0400 Received: by mail.kernel.org (Postfix) with ESMTPSA id B001C61213; Mon, 20 Sep 2021 19:25:22 +0000 (UTC) Subject: [PATCH RFC] NFSD: Optimize DRC bucket pruning From: Chuck Lever To: bfields@fieldses.org Cc: linux-nfs@vger.kernel.org Date: Mon, 20 Sep 2021 15:25:21 -0400 Message-ID: <163216587593.1058.15663218635528093628.stgit@klimt.1015granger.net> User-Agent: StGit/1.1 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org DRC bucket pruning is done by nfsd_cache_lookup(), which is part of every NFSv2 and NFSv3 dispatch (ie, it's done while the client is waiting). I added a trace_printk() in prune_bucket() to see just how long it takes to prune. Here are two ends of the spectrum: prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining ... prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining Pruning latency is noticeable on fast transports with fast storage. By noticeable, I mean that the latency measured here in the worst case is the same order of magnitude as the round trip time for cached server operations. We could do something like moving expired entries to an expired list and then free them later instead of freeing them right in prune_bucket(). But simply limiting the number of entries that can be pruned by a lookup is simple and retains more entries in the cache, making the DRC somewhat more effective. Comparison with a 70/30 fio 8KB 12 thread direct I/O test: Before: write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets WRITE: 1848726 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.635158 RTT: 0.128525 total execute time: 0.827242 (milliseconds) After: write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets WRITE: 1891144 ops (30%) avg bytes sent per op: 8340 avg bytes received per op: 136 backlog wait: 0.616114 RTT: 0.126842 total execute time: 0.805348 (milliseconds) Signed-off-by: Chuck Lever --- fs/nfsd/nfscache.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/fs/nfsd/nfscache.c b/fs/nfsd/nfscache.c index 96cdf77925f3..6e0b6f3148dc 100644 --- a/fs/nfsd/nfscache.c +++ b/fs/nfsd/nfscache.c @@ -241,8 +241,8 @@ lru_put_end(struct nfsd_drc_bucket *b, struct svc_cacherep *rp) list_move_tail(&rp->c_lru, &b->lru_head); } -static long -prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) +static long prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn, + unsigned int max) { struct svc_cacherep *rp, *tmp; long freed = 0; @@ -258,11 +258,17 @@ prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) time_before(jiffies, rp->c_timestamp + RC_EXPIRE)) break; nfsd_reply_cache_free_locked(b, rp, nn); - freed++; + if (max && freed++ > max) + break; } return freed; } +static long nfsd_prune_bucket(struct nfsd_drc_bucket *b, struct nfsd_net *nn) +{ + return prune_bucket(b, nn, 3); +} + /* * Walk the LRU list and prune off entries that are older than RC_EXPIRE. * Also prune the oldest ones when the total exceeds the max number of entries. @@ -279,7 +285,7 @@ prune_cache_entries(struct nfsd_net *nn) if (list_empty(&b->lru_head)) continue; spin_lock(&b->cache_lock); - freed += prune_bucket(b, nn); + freed += prune_bucket(b, nn, 0); spin_unlock(&b->cache_lock); } return freed; @@ -453,8 +459,7 @@ int nfsd_cache_lookup(struct svc_rqst *rqstp) atomic_inc(&nn->num_drc_entries); nfsd_stats_drc_mem_usage_add(nn, sizeof(*rp)); - /* go ahead and prune the cache */ - prune_bucket(b, nn); + nfsd_prune_bucket(b, nn); out_unlock: spin_unlock(&b->cache_lock);