Received: by 2002:a05:7412:419a:b0:f3:1519:9f41 with SMTP id i26csp3856197rdh; Tue, 28 Nov 2023 05:52:02 -0800 (PST) X-Google-Smtp-Source: AGHT+IHUYRz02yVRa57nUpZRvi3LA/nuLmQ2KJgWPEW6BTsAo3MINj1UR6W17e0MDKdgqonrAe3J X-Received: by 2002:a05:6a00:a87:b0:6c3:402a:d54d with SMTP id b7-20020a056a000a8700b006c3402ad54dmr14278911pfl.11.1701179522311; Tue, 28 Nov 2023 05:52:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701179522; cv=none; d=google.com; s=arc-20160816; b=aBfJackEHOED57nLadPEA7tKJyO9WRPoYO2Idat28/+kVVetBg4YX315epeTLJBi7s TKXYyQnTDTyO8FIabeQk/PPGVig8PH0zNuVLaUv4rN2WLdkSH8C16S5vCoNHz/fHZMeh U/PJgyTDYNT80dLZoiK7XGmMcj/61sr+wsIfb9JrleRawClKqDPn8scbk1XhRoSCY52i OezfJD8w8hyouD5c8dPNBrdV3rfQcPSQKGApDU6erT5hzofxWac7LmbWD+xk50KJFqCz p9r1l/3vslc7erfbauCC+t2gvFEcGnQAdWnCS056jZFIowXD3SLHZcvRTSXPUdtbikHC 4JKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=tfnFWabQ+eZFWIMGoNV0lbzZVxbGBtu1EKRuzZLWCFg=; fh=zpiZ0YQuo69qJC33RG9eFBOidcZlvPEy6KCi0QsGTPA=; b=wjkhbRTFxqvwgJbvkVgYFUMLQtbP1rzlH7CkJANI9UG5qQYqwJDTV27tIw9nbxXydr SXEV+0to9Kzpdncw7/4JO8Sv5vyNq42qnV1jtqNdbi4e8+j1AaSodzquW8Ep+TOGw0Eh HsE73Eksfew0JQMrUj/01ycYU3Ga43BqDTUn8BRCRyWQWLc/ve2Bq08v9H0oD6BGlWV5 opWyIUQ8B3YoqE6MCr0tnxTy8HJK1AN0zncNIlR3gXt599HEHU1vWzCCbz3t8GvLYTPq N6eUb4RvkXcfnpfnb/1DbPGwzU7K4Jx/C5vfdhLaC7BWMCKaV5ZKyJdGXaynOnKu/0Rf RGZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=OGHKluY9; spf=pass (google.com: domain of linux-nfs+bounces-125-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-nfs+bounces-125-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id l190-20020a6325c7000000b0056f7f18bbfdsi12278586pgl.632.2023.11.28.05.52.02 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Nov 2023 05:52:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs+bounces-125-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=OGHKluY9; spf=pass (google.com: domain of linux-nfs+bounces-125-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-nfs+bounces-125-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 8E9B9282EC0 for ; Tue, 28 Nov 2023 13:52:01 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7B3F258AC7; Tue, 28 Nov 2023 13:51:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OGHKluY9" X-Original-To: linux-nfs@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5892F54FA3; Tue, 28 Nov 2023 13:51:58 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D9031C433C7; Tue, 28 Nov 2023 13:51:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701179517; bh=rRHGMp9xXOAGGVSwGEsG0ggDC2+rM4ak/KiHR+RJz6k=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=OGHKluY9aEh9DLcNCDsk9AcqCzGUsloN7wkWESGS0oUMS7ZhqFgbIoY8awX6Z0xIv tq1mFJnetfJHZlM1woL46zq7RxuTtTEfABXMq0wX4k7zNK4wxxYIZXHJptmN5/P96S K7Bl37s9DDkpYp6LsfZuY7w0XBkx2d8BbkUl+l+Y0dWJaqTtolv3oAmr56X8YkOE5l 7dAutAj0ljiaezGnoGbcWOKC6K7fD+MCK4QAvI9FeEXD5wkiU60gVDgz5TNUfnvJEy yN6LpP3zYTBTPeglF0ci0ZRg0gQb7X/Y76N//J3s9HxuteiEVwJBuW693R5Gx3Tz02 3AFAMQXy/5kKA== Date: Tue, 28 Nov 2023 14:51:52 +0100 From: Christian Brauner To: NeilBrown Cc: Al Viro , Christian Brauner , Jeff Layton , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org Subject: Re: [PATCH/RFC] core/nfsd: allow kernel threads to use task_work. Message-ID: <20231128-blumig-anreichern-b9d8d1dc49b3@brauner> References: <170112272125.7109.6245462722883333440@noble.neil.brown.name> <170113056683.7109.13851405274459689039@noble.neil.brown.name> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <170113056683.7109.13851405274459689039@noble.neil.brown.name> [Reusing the trimmed Cc] On Tue, Nov 28, 2023 at 11:16:06AM +1100, NeilBrown wrote: > On Tue, 28 Nov 2023, Chuck Lever wrote: > > On Tue, Nov 28, 2023 at 09:05:21AM +1100, NeilBrown wrote: > > > > > > I have evidence from a customer site of 256 nfsd threads adding files to > > > delayed_fput_lists nearly twice as fast they are retired by a single > > > work-queue thread running delayed_fput(). As you might imagine this > > > does not end well (20 million files in the queue at the time a snapshot > > > was taken for analysis). > > > > > > While this might point to a problem with the filesystem not handling the > > > final close efficiently, such problems should only hurt throughput, not > > > lead to memory exhaustion. > > > > I have this patch queued for v6.8: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-next&id=c42661ffa58acfeaf73b932dec1e6f04ce8a98c0 > > > > Thanks.... > I think that change is good, but I don't think it addresses the problem > mentioned in the description, and it is not directly relevant to the > problem I saw ... though it is complicated. > > The problem "workqueue ... hogged cpu..." probably means that > nfsd_file_dispose_list() needs a cond_resched() call in the loop. > That will stop it from hogging the CPU whether it is tied to one CPU or > free to roam. > > Also that work is calling filp_close() which primarily calls > filp_flush(). > It also calls fput() but that does minimal work. If there is much work > to do then that is offloaded to another work-item. *That* is the > workitem that I had problems with. > > The problem I saw was with an older kernel which didn't have the nfsd > file cache and so probably is calling filp_close more often. So maybe > my patch isn't so important now. Particularly as nfsd now isn't closing > most files in-task but instead offloads that to another task. So the > final fput will not be handled by the nfsd task either. > > But I think there is room for improvement. Gathering lots of files > together into a list and closing them sequentially is not going to be as > efficient as closing them in parallel. > > > > > > For normal threads, the thread that closes the file also calls the > > > final fput so there is natural rate limiting preventing excessive growth > > > in the list of delayed fputs. For kernel threads, and particularly for > > > nfsd, delayed in the final fput do not impose any throttling to prevent > > > the thread from closing more files. > > > > I don't think we want to block nfsd threads waiting for files to > > close. Won't that be a potential denial of service? > > Not as much as the denial of service caused by memory exhaustion due to > an indefinitely growing list of files waiting to be closed by a single > thread of workqueue. It seems less likely that you run into memory exhausting than a DOS because nfsd() is busy closing fds. Especially because you default to single nfsd thread afaict. > I think it is perfectly reasonable that when handling an NFSv4 CLOSE, > the nfsd thread should completely handle that request including all the > flush and ->release etc. If that causes any denial of service, then > simple increase the number of nfsd threads. But isn't that a significant behavioral change? So I would expect to make this at configurable via a module- or Kconfig option? > For NFSv3 it is more complex. On the kernel where I saw a problem the > filp_close happen after each READ or WRITE (though I think the customer > was using NFSv4...). With the file cache there is no thread that is > obviously responsible for the close. > To get the sort of throttling that I think is need, we could possibly > have each "nfsd_open" check if there are pending closes, and to wait for > some small amount of progress. > > But don't think it is reasonable for the nfsd threads to take none of > the burden of closing files as that can result in imbalance. It feels that this really needs to be tested under a similar workload in question to see whether this is a viable solution.