Received: by 2002:a25:683:0:0:0:0:0 with SMTP id 125csp4617378ybg; Mon, 8 Jun 2020 12:22:10 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxwRlx1yxuQ7OuuRp366COf5ioDj3o0pzx1fcFLq7BwTqXN/ZIKyyWTftdbza5IUEeZhoEB X-Received: by 2002:a50:c44b:: with SMTP id w11mr4074999edf.317.1591644130103; Mon, 08 Jun 2020 12:22:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1591644130; cv=none; d=google.com; s=arc-20160816; b=zi/dNcaDlrjBS3gqglg6iAp7bn2MBYaEVihKlVM/hiyQqWmMKDzXB1re4D6w2SZr6k HZxrMoRKhQojeVdb/fOe+jz7E0bOG2aAIUszDGLrH5OgYGZYSOEsKSayMjWcdDB/UHif uP4hTdIjYfCrpsIYRESkeXk3LD4/jbH9feYDp8JEfGmnguH4OsJY2/ST8saGXklrXq9m G9yZ73CYbU1Xqp3s9n6bJ1BbycehAdAXYoA/bzWdMxUCkvcneHTuAcpEPpN8CFbfvHAm 6mCs9+SBnz1phpnPb14jLshN/AuJxK3stfzJwLgZDRzE8uP2ibb5N/5SqDJJshk59ynb XzLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:content-disposition :mime-version:message-id:subject:cc:to:from:date:ironport-sdr :dkim-signature; bh=1Na/w/Yu3QOOXMPe8D2Ie4lQOC+d41IK2aCsczO3jx8=; b=Ypxeb0T/kc9AwCUpztjQk/9zx7pry1JQpmyfJq+QVe/jNUv0s3bQPcPRJ3dol3hrLx o3ElYhG0VKmk2YVEMf2+NBIWHE0frERMfy8vW7/3c9Yi693WyVzZhNCA/q8eWYaionkg OBALpX0fmGJC8dy2mbjLubl3yIUJJMKjNuzZo0yXaPX99GN+ErHNPaip3j7PfTb5ecxy TmgznDbbx4k10pSw8dddNZsRb1P67capdOYqc0QrgdTV4OBn1KUxKm5dv1uVGocbMsCc 1w6fN20y43Ejf4wRti5tampMf46XH5HcNdKOvfyOF/G7D6HdYda898Gsdjk9OnpiCkD9 T6Lg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=VyizJQh5; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id os18si9185896ejb.390.2020.06.08.12.21.35; Mon, 08 Jun 2020 12:22:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=VyizJQh5; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726395AbgFHTVb (ORCPT + 99 others); Mon, 8 Jun 2020 15:21:31 -0400 Received: from smtp-fw-9101.amazon.com ([207.171.184.25]:50141 "EHLO smtp-fw-9101.amazon.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726227AbgFHTVb (ORCPT ); Mon, 8 Jun 2020 15:21:31 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1591644090; x=1623180090; h=date:from:to:cc:subject:message-id:mime-version; bh=1Na/w/Yu3QOOXMPe8D2Ie4lQOC+d41IK2aCsczO3jx8=; b=VyizJQh5z5X31RWkC/MQecPQJqn+X7ZDzmf7KMKID3KcQB/2WRI9zX6Z 9dYHQyLpB5sG/TtBCGuCC+pDknQPNPpsmYjOIjJgQCVbLUlC4oXbQ/SiQ psUQwjRTfCpVwbjagPxz+11oGyHo6DwPDjfGosusQAXRKLFDMei5estSe k=; IronPort-SDR: /yllG0IE6cWcFChKedG6XTHhMYvZT862obBZJxo9TBNKu2QvibqkbSYIu9XkCqHrYQDDVclr7e qY3qJN6oAM7g== X-IronPort-AV: E=Sophos;i="5.73,487,1583193600"; d="scan'208";a="42470798" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-1e-57e1d233.us-east-1.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 08 Jun 2020 19:21:25 +0000 Received: from EX13MTAUWC001.ant.amazon.com (iad55-ws-svc-p15-lb9-vlan3.iad.amazon.com [10.40.159.166]) by email-inbound-relay-1e-57e1d233.us-east-1.amazon.com (Postfix) with ESMTPS id C79271416F6; Mon, 8 Jun 2020 19:21:23 +0000 (UTC) Received: from EX13D43UWC002.ant.amazon.com (10.43.162.172) by EX13MTAUWC001.ant.amazon.com (10.43.162.135) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 8 Jun 2020 19:21:22 +0000 Received: from EX13MTAUWC001.ant.amazon.com (10.43.162.135) by EX13D43UWC002.ant.amazon.com (10.43.162.172) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 8 Jun 2020 19:21:22 +0000 Received: from dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com (172.23.141.97) by mail-relay.amazon.com (10.43.162.232) with Microsoft SMTP Server id 15.0.1497.2 via Frontend Transport; Mon, 8 Jun 2020 19:21:22 +0000 Received: by dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com (Postfix, from userid 6262777) id EE426C13E5; Mon, 8 Jun 2020 19:21:22 +0000 (UTC) Date: Mon, 8 Jun 2020 19:21:22 +0000 From: Frank van der Linden To: Bruce Fields , Trond Myklebust , Chuck Lever CC: Subject: nfsd filecache issues with v4 Message-ID: <20200608192122.GA19171@dev-dsk-fllinden-2c-c1893d73.us-west-2.amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.11.4 (2019-03-13) Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org We recently noticed that, with 5.4+ kernels, the generic/531 test takes a very long time to finish for v4, especially when run on larger systems. Case in point: a 72 VCPU, 144G EC2 instance as a client will make the test last about 20 hours. So, I had a look to see what was going on. First of all, the test generates a lot of files - what it does is generate 50000 files per process, where it starts 2 * NCPU processes. So that's 144 processes in this case, 50000 files each. Also, it does it by setting the file ulimit to 50000, and then just opening files, keeping them open, until it hits the limit. So that's 7 million new/open files - that's a lot, but the problem can be triggered with far fewer than that as well. Looking at what the server was doing, I noticed a lot of lock contention for nfsd_file_lru. Then I noticed that that nfsd_filecache_count kept going up, reflecting the number of open files by the client processes, eventually reaching, for example, that 7 million number. So here's what happens: for NFSv4, files that are associated with an open stateid can stick around for a long time, as long as there's no CLOSE done on them. That's what's happening here. Also, since those files have a refcount of >= 2 (one for the hash table, one for being pointed to by the state), they are never eligible for removal from the file cache. Worse, since the code call nfs_file_gc inline if the upper bound is crossed (8192), every single operation that calls nfsd_file_acquire will end up walking the entire LRU, trying to free files, and failing every time. Walking a list with millions of files every single time isn't great. There are some ways to fix this behavior like: * Always allow v4 cached file structured to be purged from the cache. They will stick around, since they still have a reference, but at least they won't slow down cache handling to a crawl. * Don't add v4 files to the cache to begin with. * Since the only advantage of the file cache for v4 is the caching of files linked to special stateids (as far as I can tell), only cache files associated with special state ids. * Don't bother with v4 files at all, and revert the changes that made v4 use the file cache. In general, the resource control for files OPENed by the client is probably an issue. Even if you fix the cache, what if there are N clients that open millions of files and keep them open? Maybe there should be a fallback to start using temporary open files if a client goes beyond a reasonable limit and threatens to eat all resources. Thoughts? - Frank