Received: by 2002:a25:1506:0:0:0:0:0 with SMTP id 6csp705008ybv; Thu, 13 Feb 2020 08:04:34 -0800 (PST) X-Google-Smtp-Source: APXvYqzlZG7tH2ZHO4LloeIbTVtMzC0Si7s8ksIpBIdJp8ssglWPegTftEHDafEb9S8rhg90qiiO X-Received: by 2002:a05:6830:1d5b:: with SMTP id p27mr13289932oth.263.1581609874007; Thu, 13 Feb 2020 08:04:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1581609874; cv=none; d=google.com; s=arc-20160816; b=I4VB1bIIphNQxBaKekgVlpUNSCqTFFoII5Qwjn7hB4DaLYJFfY+D7veu7b9C1VkAG+ OFFfXdh/APxubopeyL8I0Cd2g2RvDaIxRSowkR6+v0NCbCPnSniilzUdZgyd92vZN0vw terI5BtHW8t8fPxEi7DNclsAyCwN551xayUYfjmcXjvp1dphdfN0Ws0Iaul1sv+d3kkF /9I2nQHtL0od9VM4xuzKGuIUy2yVo+OKsdVzJO4OKvw/HXEXISrDjB0U5AxTt7JC/hJE 7ShN4aESaM/bBXRSN/d8nDehsbiBKCJBo5Y+xHEQFAs4KDe73n55HN79IS6lwRVMMjn3 D5GA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature; bh=jI/BywyjyHDBhFbo7LEZixfJ4OlGd/2vRlnVy4n540M=; b=tj9/KJB0KzO0WEuaK+aQ+geNNkrXqTsIPx+DFpIZKCih2B19rf5OSD8Dcw6BhDqLY0 NowDXvY0kGU5gLWluUp0H+TMR9dI/EIrzVkZpZdejkELZMwF/FqFc3MDbGaMYVs74/nM hz/E54FYaWhbPM853oiUsIu7o6PxzDeS8cpSwmBZjOwo4IuPgkLLknwHKl6UwMsp5nHU C2r+n+kNpO2N9ePrzpUersssqWo+O03DBs/T/ivYj9ZA3wNxTORJoGBaBGnGcL+E3EYY Jegc+XTxs0E0owgB1ET1bUym8KdCNOKlBG/R9i2RghRSi+Mni+wD/Gr/4116T9Jgdqzc MJNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=b48Xm5LC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l13si1414539otq.30.2020.02.13.08.04.20; Thu, 13 Feb 2020 08:04:33 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=b48Xm5LC; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387714AbgBMQDf (ORCPT + 99 others); Thu, 13 Feb 2020 11:03:35 -0500 Received: from mail.kernel.org ([198.145.29.99]:37344 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2387409AbgBMPYR (ORCPT ); Thu, 13 Feb 2020 10:24:17 -0500 Received: from localhost (unknown [104.132.1.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4BEAC24699; Thu, 13 Feb 2020 15:24:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1581607456; bh=kWwyWvUBhJhc5hQF30HJTDBbfzUBx12TZnbi1NDDsRg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b48Xm5LCjSt8og1svc2dygZPuelFgoQs3w7QaO3RdHN6bT36J8/SrhvJSnJwAEMU+ ENT7Pf38h+fza5Cjh6uCj9T0D3PZmsqBx55AkYAL5xQ8WdL4qNhLq8/JGAsJYE0yIa p436ySNmuDyZCAhrW6eo1DMkE45Z6tS84kEk/VhE= From: Greg Kroah-Hartman To: linux-kernel@vger.kernel.org Cc: Greg Kroah-Hartman , stable@vger.kernel.org, Benjamin Coddington , Trond Myklebust , Sasha Levin Subject: [PATCH 4.9 076/116] NFS: switch back to to ->iterate() Date: Thu, 13 Feb 2020 07:20:20 -0800 Message-Id: <20200213151912.263891370@linuxfoundation.org> X-Mailer: git-send-email 2.25.0 In-Reply-To: <20200213151842.259660170@linuxfoundation.org> References: <20200213151842.259660170@linuxfoundation.org> User-Agent: quilt/0.66 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Benjamin Coddington [ Upstream commit b044f64513843e960f4b8d8e2e042abca1b7c029 ] NFS has some optimizations for readdir to choose between using READDIR or READDIRPLUS based on workload, and which NFS operation to use is determined by subsequent interactions with lookup, d_revalidate, and getattr. Concurrent use of nfs_readdir() via ->iterate_shared() can cause those optimizations to repeatedly invalidate the pagecache used to store directory entries during readdir(), which causes some very bad performance for directories with many entries (more than about 10000). There's a couple ways to fix this in NFS, but no fix would be as simple as going back to ->iterate() to serialize nfs_readdir(), and neither fix I tested performed as well as going back to ->iterate(). The first required taking the directory's i_lock for each entry, with the result of terrible contention. The second way adds another flag to the nfs_inode, and so keeps the optimizations working for large directories. The difference from using ->iterate() here is that much more memory is consumed for a given workload without any performance gain. The workings of nfs_readdir() are such that concurrent users are serialized within read_cache_page() waiting to retrieve pages of entries from the server. By serializing this work in iterate_dir() instead, contention for cache pages is reduced. Waiting processes can have an uncontended pass at the entirety of the directory's pagecache once previous processes have completed filling it. v2 - Keep the bits needed for parallel lookup Signed-off-by: Benjamin Coddington Signed-off-by: Trond Myklebust Signed-off-by: Sasha Levin --- fs/nfs/dir.c | 37 ++++++++++++------------------------- 1 file changed, 12 insertions(+), 25 deletions(-) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 1e5321d1ed226..a41df7d44bd7a 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -57,7 +57,7 @@ static void nfs_readdir_clear_array(struct page*); const struct file_operations nfs_dir_operations = { .llseek = nfs_llseek_dir, .read = generic_read_dir, - .iterate_shared = nfs_readdir, + .iterate = nfs_readdir, .open = nfs_opendir, .release = nfs_closedir, .fsync = nfs_fsync_dir, @@ -145,7 +145,6 @@ struct nfs_cache_array_entry { }; struct nfs_cache_array { - atomic_t refcount; int size; int eof_index; u64 last_cookie; @@ -201,20 +200,11 @@ void nfs_readdir_clear_array(struct page *page) int i; array = kmap_atomic(page); - if (atomic_dec_and_test(&array->refcount)) - for (i = 0; i < array->size; i++) - kfree(array->array[i].string.name); + for (i = 0; i < array->size; i++) + kfree(array->array[i].string.name); kunmap_atomic(array); } -static bool grab_page(struct page *page) -{ - struct nfs_cache_array *array = kmap_atomic(page); - bool res = atomic_inc_not_zero(&array->refcount); - kunmap_atomic(array); - return res; -} - /* * the caller is responsible for freeing qstr.name * when called by nfs_readdir_add_to_array, the strings will be freed in @@ -674,7 +664,6 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page, goto out_label_free; } memset(array, 0, sizeof(struct nfs_cache_array)); - atomic_set(&array->refcount, 1); array->eof_index = -1; status = nfs_readdir_alloc_pages(pages, array_size); @@ -737,7 +726,8 @@ int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page* page) static void cache_page_release(nfs_readdir_descriptor_t *desc) { - nfs_readdir_clear_array(desc->page); + if (!desc->page->mapping) + nfs_readdir_clear_array(desc->page); put_page(desc->page); desc->page = NULL; } @@ -745,16 +735,8 @@ void cache_page_release(nfs_readdir_descriptor_t *desc) static struct page *get_cache_page(nfs_readdir_descriptor_t *desc) { - struct page *page; - - for (;;) { - page = read_cache_page(desc->file->f_mapping, + return read_cache_page(desc->file->f_mapping, desc->page_index, (filler_t *)nfs_readdir_filler, desc); - if (IS_ERR(page) || grab_page(page)) - break; - put_page(page); - } - return page; } /* @@ -960,11 +942,13 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx) static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence) { + struct inode *inode = file_inode(filp); struct nfs_open_dir_context *dir_ctx = filp->private_data; dfprintk(FILE, "NFS: llseek dir(%pD2, %lld, %d)\n", filp, offset, whence); + inode_lock(inode); switch (whence) { case 1: offset += filp->f_pos; @@ -972,13 +956,16 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence) if (offset >= 0) break; default: - return -EINVAL; + offset = -EINVAL; + goto out; } if (offset != filp->f_pos) { filp->f_pos = offset; dir_ctx->dir_cookie = 0; dir_ctx->duped = 0; } +out: + inode_unlock(inode); return offset; } -- 2.20.1