Received: by 2002:a05:7412:8d10:b0:f3:1519:9f41 with SMTP id bj16csp213121rdb; Tue, 5 Dec 2023 03:29:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IHZ2yWovAmXUdU+yGV7TNfSMdE6BG5gg0P0Ap42xpx64NTQ1AWnTxF7lN7ZVhKpkkXBh9RO X-Received: by 2002:a05:6a21:1c85:b0:18f:97c:9275 with SMTP id sf5-20020a056a211c8500b0018f097c9275mr2491062pzb.90.1701775799120; Tue, 05 Dec 2023 03:29:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701775799; cv=none; d=google.com; s=arc-20160816; b=bUc3LU2Jjgvt5R8TcvVwT1+mqtvGZdat4KVEwSyJ1OYdihP6UX2pE6rn3o7N+Bszae ryJ3qmU/372W70LZdtw6pV7QdPVw8oPzXjtEv3g08FeJ7ebp7htpWBoJ56OVU5WTodvc zng1sMJrgCMz4V4Rnj1L0ZD+wmW7manL64EQOP3ievi147t8Y83NsdLUZcUXB6/o847W Uvr0pKS+G3VKhAoyMe0i/Du3JwU7yMm74aDkjMx4ppajNy3h+yDYqxXYuKD1anU7czPj t4En7ZAnhIh43yg2Z/g7Q+pPjXU0t1iB/p1W4+zcyK+vutuvtjfXJeB0JSZZAVp/Zqfa ezAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=AdA/H++I5BimQu8AqWjFmk6/Xk4gAAmqeOX5FN178jQ=; fh=0B3cKPXRHTRAKzyhQnTlHpw0I8RYtF7wLdG81B0VLew=; b=LyhaRQTczU9jB3tu68dWOumx3MrrV5e74zVMXeLYdiFmJkrZdx9944rq7J2SuOvJdi zDhG8Ydo9VFD2U3Ov0jHgL8z7fEjZkcGmJ4hM+ljbveOu1vBWB3dU+ebSzXD//fqX2A0 oivZ6FNLJi4OOoYzRGNJ5ysXZj7UM3i3KShCq11qfDg6Q0KYT3vc0W95J+INJmUfA63R 93RSuFoSAvTCoWcz5ox6sExG/cLdZflAuxSueCuHuYGy87XOkY/SBelZWoo5c5rLp0Xv C4hTdF6P7a47ywcFHZvxAEkD9EoEuOineeCYR/F00ZqN2jNVCRz7K4allVsNcuQe1bVD X9Xw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rJ6pdTtW; spf=pass (google.com: domain of linux-nfs+bounces-332-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-nfs+bounces-332-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [147.75.48.161]) by mx.google.com with ESMTPS id b3-20020a17090a800300b0028644ca706dsi8947766pjn.171.2023.12.05.03.29.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 03:29:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs+bounces-332-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=rJ6pdTtW; spf=pass (google.com: domain of linux-nfs+bounces-332-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) smtp.mailfrom="linux-nfs+bounces-332-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 52772B20B13 for ; Tue, 5 Dec 2023 11:29:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2D5466979D; Tue, 5 Dec 2023 11:29:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="rJ6pdTtW" X-Original-To: linux-nfs@vger.kernel.org Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0AD0A6979A; Tue, 5 Dec 2023 11:29:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 363CFC433C8; Tue, 5 Dec 2023 11:29:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1701775789; bh=OGNuzQBG9/CRjR8NodZoOr3c55TMx5WozGpVBqt1EtE=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=rJ6pdTtWfZILatWslgp2ufMMTUx2ifvZIvE47lAurG9qt1aV/veDgkPgtQ0y2XoJe r7f1tk8sjry9Bt2w7prfFyXPJkf2rkYUU79WDGO8WWaNCMSwqpTr7xj9dVOxF8ufjO Qhls3lUFZFNLd7tr+wAx0ziuxx40N++l3bqs9twoClXjV19gvrxE8AFnNvryD7/3Fm zKWzSnHYHEaKlgRZD0FqQaNeso7rItXsdNKV5Kz/Vm/8r7k6PkxYN4qDyuk8kl1EX2 FdVkrBhZ9JqQqGw7Cf5eqwWl1Y9/hi+weNLBWlGsdcwGfGML8sBU+Xb13AdxyDGE0k cxF6+fvH+m5LA== Date: Tue, 5 Dec 2023 12:29:43 +0100 From: Christian Brauner To: NeilBrown Cc: Dave Chinner , Al Viro , Jens Axboe , Oleg Nesterov , Chuck Lever , Jeff Layton , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org Subject: Re: [PATCH 1/2] Allow a kthread to declare that it calls task_work_run() Message-ID: <20231205-liedtexte-quantenphysik-804eab7f97d8@brauner> References: <20231204014042.6754-1-neilb@suse.de> <20231204014042.6754-2-neilb@suse.de> <170176610023.7109.11175368186869568821@noble.neil.brown.name> Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <170176610023.7109.11175368186869568821@noble.neil.brown.name> On Tue, Dec 05, 2023 at 07:48:20PM +1100, NeilBrown wrote: > On Tue, 05 Dec 2023, Dave Chinner wrote: > > On Mon, Dec 04, 2023 at 12:36:41PM +1100, NeilBrown wrote: > > > User-space processes always call task_work_run() as needed when > > > returning from a system call. Kernel-threads generally do not. > > > Because of this some work that is best run in the task_works context > > > (guaranteed that no locks are held) cannot be queued to task_works from > > > kernel threads and so are queued to a (single) work_time to be managed > > > on a work queue. > > > > > > This means that any cost for doing the work is not imposed on the kernel > > > thread, and importantly excessive amounts of work cannot apply > > > back-pressure to reduce the amount of new work queued. > > > > > > I have evidence from a customer site when nfsd (which runs as kernel > > > threads) is being asked to modify many millions of files which causes > > > sufficient memory pressure that some cache (in XFS I think) gets cleaned > > > earlier than would be ideal. When __dput (from the workqueue) calls > > > __dentry_kill, xfs_fs_destroy_inode() needs to synchronously read back > > > previously cached info from storage. > > > > We fixed that specific XFS problem in 5.9. > > > > https://lore.kernel.org/linux-xfs/20200622081605.1818434-1-david@fromorbit.com/ > > Good to know - thanks. > > > > > Can you reproduce these issues on a current TOT kernel? > > I haven't tried. I don't know if I know enough details of the work load > to attempt it. > > > > > If not, there's no bugs to fix in the upstream kernel. If you can, > > then we've got more XFS issues to work through and fix. > > > > Fundamentally, though, we should not be papering over an XFS issue > > by changing how core task_work infrastructure is used. So let's deal > > with the XFS issue first.... > > I disagree. This customer experience has demonstrated both a bug in XFS > and bug in the interaction between fput, task_work, and nfsd. > > If a bug in a filesystem that only causes a modest performance impact > when used through the syscall API can bring the system to its knees > through memory exhaustion when used by nfsd, then that is a robustness > issue for nfsd. > > I want to fix that robustness issue so that unusual behaviour in > filesystems does not cause out-of-proportion bad behaviour in nfsd. > > I highlighted this in the cover letter to the first version of my patch: > > https://lore.kernel.org/all/170112272125.7109.6245462722883333440@noble.neil.brown.name/ > > While this might point to a problem with the filesystem not handling the > final close efficiently, such problems should only hurt throughput, not > lead to memory exhaustion. I'm still confused about this memory exhaustion claim? If this is a filesystem problem it's pretty annoying that we have to work around it by exposing task work to random modules.