Received: by 2002:a05:6500:1b41:b0:1fb:d597:ff75 with SMTP id cz1csp429931lqb; Tue, 4 Jun 2024 16:19:43 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVIrbD/wLuuZhG2G3gj6Hj11r7NFnqLzoMnQnsJBtlBeL+E1j6gBwhFkIo01lZMtuUkooSWww9XsioqFSQ8nInE2qXoYG7P3qa+GhMaKw== X-Google-Smtp-Source: AGHT+IG27GBO++fvfVDkfwrzF/lKvSzZrCgBWWgSjQ0harLt4TfjYvVC87zuK4emurGvrT5POd5w X-Received: by 2002:a05:6a20:7489:b0:1b1:ed95:c9b1 with SMTP id adf61e73a8af0-1b2b713c94amr1176043637.40.1717543183673; Tue, 04 Jun 2024 16:19:43 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1717543183; cv=pass; d=google.com; s=arc-20160816; b=sUqghP0U6e6C32b2CtHqiFvzsQ3qjLW43PCeRAgQD/PLOm0V34Bh3Kdpsud4jsYSx4 cQYx4z+lzxi0JPRE1l+E8WiX7GMO28DvI/iIHUxmDROzdb9OhnTd58CoR2c/zr+s30CM pTRyY0RMqsUUojsMSxKYXXS4MYvh2lHKneAr70rCNNmbY6lIkBw06SnRNw7a4DWe2jbx M9CafQJ/0j16QXiTyJaAMHQ3bRRB9QEuzi0rQu8KvYmp+SAQ/LWKLPbc3bJ2UDPlDyq6 hTiihVVQHaG2/wYUr1bsufBxRhFZMC7qh75LQg3G3qOxBiGEzqYn930UoKIzSxTITJR7 nBBQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-disposition:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:message-id:subject:cc :to:from:date:dkim-signature; bh=7vQ85ddbl1Y3ttl8lJSOOTPnAC8qpNtaLzkY277e3vA=; fh=YP17JV2vVKKlNwBiQ+BIGxsoTUp53ijC3zQJcwKXMcg=; b=TwOnP51GikyQJmUIVgF7b4Mbk3PYUO55QgC2WgUh7BBi5m+od5XyX8AtAPpmwAOBAl ifSN9OikRvy2PnoEdIN97V3JRrQqpg/SMfAe9NW6bMe3Ludc8+CKeTJpX2rGUToNV86o iemIm5z4UMUawe4Clm9/RVRiOEU7Vbn6Z7ZJ71nbZUvsmxaxid3UuvdvZSvQIhc6eH8j LJdJ9V5LWUYWRVvml5I0s+2OlFSVq/HvdkzZV4qi+0pPEgghrW8LHowAjOy2aT/YQChO 5C3FGF/f6teV97LHDehLfDuvAIJ1FaS4ZQbUhOU7EpFovVAifZOVRn6JaojnXxesC2cg DQZw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@toxicpanda-com.20230601.gappssmtp.com header.s=20230601 header.b=tgiQ44bJ; arc=pass (i=1 dkim=pass dkdomain=toxicpanda-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-201106-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201106-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id 41be03b00d2f7-6c35d59d785si2398653a12.816.2024.06.04.16.19.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 16:19:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-201106-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20230601.gappssmtp.com header.s=20230601 header.b=tgiQ44bJ; arc=pass (i=1 dkim=pass dkdomain=toxicpanda-com.20230601.gappssmtp.com); spf=pass (google.com: domain of linux-kernel+bounces-201106-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-201106-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 1210528365D for ; Tue, 4 Jun 2024 16:53:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 580AA1494CC; Tue, 4 Jun 2024 16:53:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=toxicpanda-com.20230601.gappssmtp.com header.i=@toxicpanda-com.20230601.gappssmtp.com header.b="tgiQ44bJ" Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6962148840 for ; Tue, 4 Jun 2024 16:53:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.52 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717520003; cv=none; b=rDGSgKxRtO3mwPbGkUI5fpFsZH04pKO00lbm9/ET+vgIF+y+83InNvYdy2S7IBvCsNNEkEO4Po+jCJ5cok3Ox37vgNNOD8EDDlQ0XJlp3X+NoeN1WuGQSEiMLAvD70ZoN7QtpgbRQw7fF3zUmGf0lpDb6WTQk6eSn+SbF+89TWs= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1717520003; c=relaxed/simple; bh=yBAgk7JYRJdSCs/1yDW6N2bILHhUa/I1Vjjoi64zjow=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=P62CzkBfvMQaNpbaU+yvjHpLMdFh3bF0n7l/3ueAPHhFs2XN2DSTi5fOYF69r6bFskaDsa47INzdDhk2Y3blEeWrg3x1rBK5jMrohG8cBtUmJE9L5bRcaKCwljFTGJME9UfYk8h0lie4ffhMkXeuyFpyBn2kNlLuadnnT23eu3s= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=toxicpanda.com; spf=none smtp.mailfrom=toxicpanda.com; dkim=pass (2048-bit key) header.d=toxicpanda-com.20230601.gappssmtp.com header.i=@toxicpanda-com.20230601.gappssmtp.com header.b=tgiQ44bJ; arc=none smtp.client-ip=209.85.216.52 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=toxicpanda.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=toxicpanda.com Received: by mail-pj1-f52.google.com with SMTP id 98e67ed59e1d1-2c1b9152848so3992656a91.1 for ; Tue, 04 Jun 2024 09:53:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20230601.gappssmtp.com; s=20230601; t=1717520001; x=1718124801; darn=vger.kernel.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=7vQ85ddbl1Y3ttl8lJSOOTPnAC8qpNtaLzkY277e3vA=; b=tgiQ44bJ3xcSY56L4He/DRGJfOM9rWRrK+mBbyVI4ZmiR3VFZJK2ql+bP/t2p7Y3Ae pQm9JfcR/IB+yn9jo7QUMF/fbltRQ9AXooy4q5FYWujDwPtV/z6Kx2KezV/74dkDKFDH 7tDdMyF03bXsrOVLJCF6ZvNJfgutuzVosA2wLk6aDYb9LNvn2qr2s0rKEO/zawtZmnkK AaZzWzacCnOPJMgXicIHvhhUW4Zzo9Z4wZZvjrqdiN/Jk24VreVABOku0KvWqOHCUHdg 3xlcZDE5wynrKKk40jXKxK2lQVCx6FmiRScIj/agpsdlbkYin7zFNxkpB7lZZK69y1A5 LAfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717520001; x=1718124801; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7vQ85ddbl1Y3ttl8lJSOOTPnAC8qpNtaLzkY277e3vA=; b=tZRqOv90vWjcIo2A4SWlrTjIeUFQ/VgJNauh6Kj9aH7fbi70IWEQQkSGb69YWEczCb 18xj6Sp5RYUAJZw5Awt/80eLM+Qk02XElIQHrDaUn5cg2YUqh+aTsnpp3fJT1FyDRurb ESJ0H9J4tkA7MZcDBcIls1kfKHQZRU9/GMv5+O9ntG9C2620c7IzGJVUPRdm4ksJPDKg +A2ph+FOdSdM7YyDLoQzVa1CFLlBY9DfW8PYmLgVOwmobNHnqUhtGCqKsipgba8ykft5 h87WXWoh+9JjduLrNW+CSzrnNfDDrt9zTlzw7RKdBWdS0bMSv0tksADhZub4jd/m9W/Z wlmQ== X-Forwarded-Encrypted: i=1; AJvYcCV2OZQqGnbaOosH6Hc/bsTrHrwskybeTKI+p5Rv934A349ZXpvdF9sL/dO7rDfHHoZZIGta24RqNSrBwc1MRi6fepXDZrH7UJ3xZamo X-Gm-Message-State: AOJu0YyVL1SsX0989Oe6hmRzy3HHb0Ett97jPjlKQ3D2dki4mW5CzHf1 bWFhaMgFBJBdavqxQwbHMe6CiV8qBhKBT9IZoa293n8zCwQWlM3G+4XAj2RMbf8= X-Received: by 2002:a17:90b:1215:b0:2bf:bb85:edc1 with SMTP id 98e67ed59e1d1-2c1dc5ccdf3mr10534146a91.40.1717520001152; Tue, 04 Jun 2024 09:53:21 -0700 (PDT) Received: from localhost ([2620:10d:c090:600::1:de74]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2c1a77afb24sm10700646a91.46.2024.06.04.09.53.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 04 Jun 2024 09:53:20 -0700 (PDT) Date: Tue, 4 Jun 2024 12:53:19 -0400 From: Josef Bacik To: Bernd Schubert Cc: Miklos Szeredi , Jingbo Xu , "linux-fsdevel@vger.kernel.org" , "linux-kernel@vger.kernel.org" , lege.wang@jaguarmicro.com, "Matthew Wilcox (Oracle)" , "linux-mm@kvack.org" Subject: Re: [HELP] FUSE writeback performance bottleneck Message-ID: <20240604165319.GG3413@localhost.localdomain> References: <495d2400-1d96-4924-99d3-8b2952e05fc3@linux.alibaba.com> <67771830-977f-4fca-9d0b-0126abf120a5@fastmail.fm> <2f834b5c-d591-43c5-86ba-18509d77a865@fastmail.fm> <21741978-a604-4054-8af9-793085925c82@fastmail.fm> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <21741978-a604-4054-8af9-793085925c82@fastmail.fm> On Tue, Jun 04, 2024 at 04:13:25PM +0200, Bernd Schubert wrote: > > > On 6/4/24 12:02, Miklos Szeredi wrote: > > On Tue, 4 Jun 2024 at 11:32, Bernd Schubert wrote: > > > >> Back to the background for the copy, so it copies pages to avoid > >> blocking on memory reclaim. With that allocation it in fact increases > >> memory pressure even more. Isn't the right solution to mark those pages > >> as not reclaimable and to avoid blocking on it? Which is what the tmp > >> pages do, just not in beautiful way. > > > > Copying to the tmp page is the same as marking the pages as > > non-reclaimable and non-syncable. > > > > Conceptually it would be nice to only copy when there's something > > actually waiting for writeback on the page. > > > > Note: normally the WRITE request would be copied to userspace along > > with the contents of the pages very soon after starting writeback. > > After this the contents of the page no longer matter, and we can just > > clear writeback without doing the copy. > > > > But if the request gets stuck in the input queue before being copied > > to userspace, then deadlock can still happen if the server blocks on > > direct reclaim and won't continue with processing the queue. And > > sync(2) will also block in that case.> > > So we'd somehow need to handle stuck WRITE requests. I don't see an > > easy way to do this "on demand", when something actually starts > > waiting on PG_writeback. Alternatively the page copy could be done > > after a timeout, which is ugly, but much easier to implement. > > I think the timeout method would only work if we have already allocated > the pages, under memory pressure page allocation might not work well. > But then this still seems to be a workaround, because we don't take any > less memory with these copied pages. > I'm going to look into mm/ if there isn't a better solution. I've thought a bit about this, and I still don't have a good solution, so I'm going to throw out my random thoughts and see if it helps us get to a good spot. 1. Generally we are moving away from GFP_NOFS/GFP_NOIO to instead use memalloc_*_save/memalloc_*_restore, so instead the process is marked being in these contexts. We could do something similar for FUSE, tho this gets hairy with things that async off request handling to other threads (which is all of the FUSE file systems we have internally). We'd need to have some way to apply this to an entire process group, but this could be a workable solution. 2. Per-request timeouts. This is something we're planning on tackling for other reasons, but it could fit nicely here to say "if this fuse fs has a per-request timeout, skip the copy". That way we at least know we're upper bound on how long we would be "deadlocked". I don't love this approach because it's still a deadlock until the timeout elapsed, but it's an idea. 3. Since we're limiting writeout per the BDI, we could just say FUSE is special, only one memory reclaim related writeout at a time. We flag when we're doing a write via memory reclaim, and then if we try to trigger writeout via memory reclaim again we simply reject it to avoid the deadlock. This has the downside of making it so non-fuse related things that may be triggering direct reclaim through FUSE means they'll reclaim something else, and if the dirty pages from FUSE are the ones causing the problem we could spin a bunch evicting pages that we don't care about and thrashing a bit. As I said all of these have downsides, I think #1 is probably the most workable, but I haven't thought about it super thoroughly. Thanks, Josef