Received: by 2002:a05:6500:1b41:b0:1fb:d597:ff75 with SMTP id cz1csp388200lqb; Tue, 4 Jun 2024 14:42:07 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU4wLyzK8H3YKKmNjSDCMGLZI3Z0UC1lEWDpTrG7tpe+NjkxeaVXr7z6stTmXr66PmHgom3NhiDWcFJmBhWikM5viR3QTyYa6j2l9Cj6g== X-Google-Smtp-Source: AGHT+IFzmS7WWyuL/iR5enE/bSJvdosAi+sXtOANil8/baHQq+r81oumHSQLYMczG4aO81L1wP5v X-Received: by 2002:a05:6a00:369a:b0:702:7e3d:adae with SMTP id d2e1a72fcca58-703e599d9d6mr704047b3a.19.1717537327591; Tue, 04 Jun 2024 14:42:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717537327; cv=pass; d=google.com; s=arc-20160816; b=oKYQVrnY6Xwx2KprP/06hQKyH/Jkri57TLdXfkoBLF470bvBHrV8viTgtiWLWBkL0P +IPeulNZK64+GlX9qnL6Aa0JemoGNlI8qMW4NSN2tvLCxnmXN7fWpNib+Qrb4j+S7pta 62NZT176vTYOKLb4pk2vzFjvo08g6bDl+3zVhGgeL5HqVq5dEPyfxwkErB7ifdZsGYSj t5mymSoyLn6TzL2ihC79ViI1cBWjYbfJdKrzhmBxO15Xgr8m5EQMJTzyYigqryT7ccCW F6dR8RWbG1XFfUko7Sxw+nNR7I9M1ozqW6y1ytaPNTGtV5XA2o/udftWwILiegf6ZmLN LPsw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:content-language:from :references:cc:to:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id:feedback-id :dkim-signature:dkim-signature; bh=8FTZrhL5Hr7uLOv6uusR/yWGu0GdibkgC7VFGtSazy0=; fh=mwtKGzmLr3jMHB+FraPEbSaNGJolbQemO5dCJQvh6qE=; b=yP6g0jHoqkn4epTASp/cLKbmHDOxfX1BB/Hp30XStXbTy7pSJ+SKxvgT0lj/bkvbt/ XB36kIsQBAyNeDPk7aXV9e/LcVm8GV0fEBB9wniFjxqvfd2/q7syM2Q+KN1YRorBhy+7 1NPTR+YH1RMqCBr9LJz7nK46XOHr4WHIJjBuI3f+C8S2jHZU/efnSPL8TmaspsJOEuQx jIlB+FtTf9pmDZoSZx3C3vF0OBD5tT3fVHv2u0qfwfzpSR+YG+reXca7TVZ4N6w5Xx8n 3PHVVZ28XINNUXApbOvCCrUG9CMV+54CIoBM4K7DhMP03MLZUEYaKwGzsjtPYazJS09s AJmQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b="1czc91/q"; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=jSLz8vEy; arc=pass (i=1 spf=pass spfdomain=fastmail.fm dkim=pass dkdomain=fastmail.fm dkim=pass dkdomain=messagingengine.com dmarc=pass fromdomain=fastmail.fm); spf=pass (google.com: domain of linux-kernel+bounces-201412-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201412-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-6c3542e394bsi8726761a12.95.2024.06.04.14.42.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 14:42:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-201412-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@fastmail.fm header.s=fm1 header.b="1czc91/q"; dkim=pass header.i=@messagingengine.com header.s=fm1 header.b=jSLz8vEy; arc=pass (i=1 spf=pass spfdomain=fastmail.fm dkim=pass dkdomain=fastmail.fm dkim=pass dkdomain=messagingengine.com dmarc=pass fromdomain=fastmail.fm); spf=pass (google.com: domain of linux-kernel+bounces-201412-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201412-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=fastmail.fm Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 7640DB25BD9 for ; Tue, 4 Jun 2024 21:39:36 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 46FFD14C588; Tue, 4 Jun 2024 21:39:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fastmail.fm header.i=@fastmail.fm header.b="1czc91/q"; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="jSLz8vEy" Received: from wfhigh5-smtp.messagingengine.com (wfhigh5-smtp.messagingengine.com [64.147.123.156]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 099B684E1C; Tue, 4 Jun 2024 21:39:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=64.147.123.156 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717537164; cv=none; b=ldGI2yyr/IN60MkF+Hcq2tV6EZRpGJyuccGgMIjM673mT0xGIRDcuF36TZh1y+ok6hCV8mwsfgw7xyMTVh0vHjEPxVUBEdJKWmO869J9ASU476GKcORYAoNpc2H9k972lH6fdNyLt11/t+7v2ih7M5OSUQTYw0kfBeo3Z3u7QZA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717537164; c=relaxed/simple; bh=2z3DZC6/6RzpWdBjI/R1JC8aI3Hms3Oaa43Avm5Rh9I=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=j25ibskUTGeeNHxLlpgXgEv7ksoe7fWZTXVH1cfOKsIhYi0zuh6Cq5SzoxI4pVLPK+EIz4y9FVPbbh3eRI8IhdkeJjJFw1OSTBRd4zNcrFcrPb3oaKuqAIZW+HN3S2e5yHDhJ3wvmnIXRAPCsoa9CIXdkFRjxQ0Sa/v+Si0MiUU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fastmail.fm; spf=pass smtp.mailfrom=fastmail.fm; dkim=pass (2048-bit key) header.d=fastmail.fm header.i=@fastmail.fm header.b=1czc91/q; dkim=pass (2048-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b=jSLz8vEy; arc=none smtp.client-ip=64.147.123.156 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=fastmail.fm Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=fastmail.fm Received: from compute7.internal (compute7.nyi.internal [10.202.2.48]) by mailfhigh.west.internal (Postfix) with ESMTP id B42161800093; Tue, 4 Jun 2024 17:39:21 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute7.internal (MEProxy); Tue, 04 Jun 2024 17:39:22 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm1; t=1717537161; x=1717623561; bh=8FTZrhL5Hr7uLOv6uusR/yWGu0GdibkgC7VFGtSazy0=; b= 1czc91/qSzWR89UNSrj52sQNn9kg4zhZamDdkuTYeroP4MnmaExdx/bWLzo4OUaZ A81aNKUMTCJYG92zkOkRXCe6o+zo6g9Rms8L36Wxj3M2VAGEUXefMRDzmTn7WwNQ B8Nr+ug5c0zBd0rrREZTi1taLgBQCUlnJ3RlIJna9fx7yvXiEk1tK57r5W7ic0jd Cul3EVxyziqfoPryw1ZZC/tb9XNumEvZx3mD0pUzfPjGzaI5NzkxKAYuI5qneRFz QSq5kYmKmAd2S9WY3Wp7cBypi5LjPawvkQ5T1x5qh5aBEPfSLW1Z9gHVcA/s2gME Ku2eBw8QhUxUVbWmhRXiXA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm1; t=1717537161; x= 1717623561; bh=8FTZrhL5Hr7uLOv6uusR/yWGu0GdibkgC7VFGtSazy0=; b=j SLz8vEy1bBe90HbBGaMe8KPR8U9wLeEkfESPwG//70oNTdccdNH2gjhecaJtqwiq HKCS/WMj2a18IzWWshiC5v5BhSA//MjUZ+uqTtH6EzjVkxgeydBd/LVZdVkTURx9 eYWc9PiQLrxAJu1sAbHs/UFNU6Qy+n+s7diC8NeLeDnBE5/snqZsQBeQaNplG5D9 DJ8yqOp3upO3r0iodSgiXDzkeGXKn7UL0zB/Ip5TZYfUIEEKRX6SoNuQaekqj5zw 2CYbld4btI3uDQ8GOPzet4Wv2n3OXeGQgTpsKwWr5YOksjMyyIhP0Bg05tnPXDz4 CmQJ+K1HZT3Mx6USnwshA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvledrvdelhecutefuodetggdotefrodftvfcurf hrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecuuegr ihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjug hrpefkffggfgfuvfevfhfhjggtgfesthejredttddvjeenucfhrhhomhepuegvrhhnugcu ufgthhhusggvrhhtuceosggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrghilhdrfh hmqeenucggtffrrghtthgvrhhnpeevhffgvdeltddugfdtgfegleefvdehfeeiveejieef veeiteeggffggfeulefgjeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmh grihhlfhhrohhmpegsvghrnhgurdhstghhuhgsvghrthesfhgrshhtmhgrihhlrdhfmh X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 4 Jun 2024 17:39:18 -0400 (EDT) Message-ID: <6853a389-031b-4bd6-a300-dea878979d8c@fastmail.fm> Date: Tue, 4 Jun 2024 23:39:17 +0200 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [HELP] FUSE writeback performance bottleneck To: Josef Bacik Cc: Miklos Szeredi , Jingbo Xu , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , lege.wang@jaguarmicro.com, "Matthew Wilcox (Oracle)" , "linux-mm@kvack.org" References: <495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com> <67771830-977f-4fca-9d0b-0126abf120a5@fastmail.fm> <2f834b5c-d591-43c5-86ba-18509d77a865@fastmail.fm> <21741978-a604-4054-8af9-793085925c82@fastmail.fm> <20240604165319.GG3413@localhost.localdomain> From: Bernd Schubert Content-Language: en-US, de-DE, fr In-Reply-To: <20240604165319.GG3413@localhost.localdomain> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 6/4/24 18:53, Josef Bacik wrote: > On Tue, Jun 04, 2024 at 04:13:25PM +0200, Bernd Schubert wrote: >> >> >> On 6/4/24 12:02, Miklos Szeredi wrote: >>> On Tue, 4 Jun 2024 at 11:32, Bernd Schubert wrote: >>> >>>> Back to the background for the copy, so it copies pages to avoid >>>> blocking on memory reclaim. With that allocation it in fact increases >>>> memory pressure even more. Isn't the right solution to mark those pages >>>> as not reclaimable and to avoid blocking on it? Which is what the tmp >>>> pages do, just not in beautiful way. >>> >>> Copying to the tmp page is the same as marking the pages as >>> non-reclaimable and non-syncable. >>> >>> Conceptually it would be nice to only copy when there's something >>> actually waiting for writeback on the page. >>> >>> Note: normally the WRITE request would be copied to userspace along >>> with the contents of the pages very soon after starting writeback. >>> After this the contents of the page no longer matter, and we can just >>> clear writeback without doing the copy. >>> >>> But if the request gets stuck in the input queue before being copied >>> to userspace, then deadlock can still happen if the server blocks on >>> direct reclaim and won't continue with processing the queue. And >>> sync(2) will also block in that case.> >>> So we'd somehow need to handle stuck WRITE requests. I don't see an >>> easy way to do this "on demand", when something actually starts >>> waiting on PG_writeback. Alternatively the page copy could be done >>> after a timeout, which is ugly, but much easier to implement. >> >> I think the timeout method would only work if we have already allocated >> the pages, under memory pressure page allocation might not work well. >> But then this still seems to be a workaround, because we don't take any >> less memory with these copied pages. >> I'm going to look into mm/ if there isn't a better solution. > > I've thought a bit about this, and I still don't have a good solution, so I'm > going to throw out my random thoughts and see if it helps us get to a good spot. > > 1. Generally we are moving away from GFP_NOFS/GFP_NOIO to instead use > memalloc_*_save/memalloc_*_restore, so instead the process is marked being in > these contexts. We could do something similar for FUSE, tho this gets hairy > with things that async off request handling to other threads (which is all of > the FUSE file systems we have internally). We'd need to have some way to > apply this to an entire process group, but this could be a workable solution. > I'm not sure how either of of both (GFP_ and memalloc_) would work for userspace allocations. Wouldn't we basically need to have a feature to disable memory allocations for fuse userspace tasks? Hmm, maybe through mem_cgroup. Although even then, the file system might depend on other kernel resources (backend file system or block device or even network) that might do allocations on their own without the knowledge of the fuse server. > 2. Per-request timeouts. This is something we're planning on tackling for other > reasons, but it could fit nicely here to say "if this fuse fs has a > per-request timeout, skip the copy". That way we at least know we're upper > bound on how long we would be "deadlocked". I don't love this approach > because it's still a deadlock until the timeout elapsed, but it's an idea. Hmm, how do we know "this fuse fs has a per-request timeout"? I don't think we could trust initialization flags set by userspace. > > 3. Since we're limiting writeout per the BDI, we could just say FUSE is special, > only one memory reclaim related writeout at a time. We flag when we're doing > a write via memory reclaim, and then if we try to trigger writeout via memory > reclaim again we simply reject it to avoid the deadlock. This has the > downside of making it so non-fuse related things that may be triggering > direct reclaim through FUSE means they'll reclaim something else, and if the > dirty pages from FUSE are the ones causing the problem we could spin a bunch > evicting pages that we don't care about and thrashing a bit. Isn't that what we have right now? Reclaim basically ignores fuse tmp pages. Thanks, Bernd