Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp894808pxb; Tue, 1 Feb 2022 12:39:02 -0800 (PST) X-Google-Smtp-Source: ABdhPJxNflxIeRosWy7WbO0DFXC73lMYlLCAD+UIhBBHnCg0l5Cbd063ADgdLaR0C57ZYpg+ZP2d X-Received: by 2002:a62:7c4a:: with SMTP id x71mr27265917pfc.5.1643747941935; Tue, 01 Feb 2022 12:39:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643747941; cv=none; d=google.com; s=arc-20160816; b=gll3RmuCxWhKRNyKocH0pPyTX6B+TDaR4JiK7QKOCQBHBeh0HiT4yQ2usK21fFjawF evgvETcipuU6bbgdS/jXY6fOWsAmAbk/5w3cg+NKQc5F8hmnqdY9QDDmAbaBilTDjmX2 S9JaBHCq03FsLL1SZP7RNpmjCIcEqoD7JtK0GDs8bQDC78ZwZeUZRkvWE1c5muixxFOx 2RVIdHyI8/FfdgMG19QgOyLgatsGHt6+jeLdIBnUJhInFJCny0fAI48MbkDS9ttLJGrG 9xhoTrpS6PoSO4WE9bsJCaVaetVOf8LtojYswjU2UXBbVus1Kko3PlKpXm2ywqax2xyy wAEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=a0hve9Zah0GLqzeFulizCrPjPl4njK9R2PBNHZL7apA=; b=DarlRGoo9hu3nkI5X+4zOKzwx04ecmpNdEBENAu/ak/ZD51yUjKg8dQyrqhzWpSMPC zvnE+7JtZharGsjREtTeBlV9Gy0lWOujutISgUaoXWP2qKI9J4vFSPSEm9MCEybCl9cU zkOPqUkzu9ZsqyrqsimjMLq4XRSCUekbD9sBzSOPRNlN30UwUXz65YCiXt5GX+4Pq0w3 WspEhJ4Tq8TGryq/Vg863GNRE0LYOA1oifqBrZr1WvusHbjAFhMk29zb5vh2Juoqi7yW hOdl92hrXJFb/3aOPv475FQkrD39jbd2loIfrkG8XteDNLvxvp5nCc+WuODINQNB6PDV QOFA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="g1/eWZJ9"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v9si5255288plg.15.2022.02.01.12.38.39; Tue, 01 Feb 2022 12:39:01 -0800 (PST) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="g1/eWZJ9"; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1378578AbiAaNM3 (ORCPT + 99 others); Mon, 31 Jan 2022 08:12:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47300 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1378573AbiAaNM1 (ORCPT ); Mon, 31 Jan 2022 08:12:27 -0500 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4799C061714; Mon, 31 Jan 2022 05:12:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=a0hve9Zah0GLqzeFulizCrPjPl4njK9R2PBNHZL7apA=; b=g1/eWZJ918eTkgUtvMoL5izh9i Owrg8UXzWhZ4euJ5rmNmyKv8w3D5s3PzqdRFd4JGFN1v8KvZWfOislDJeYS/2HhBHEbzl0ZVwks7Y s52PZ0PrtSTTxWmiZnIQc/3zfoxLDw4sWIuiSYdaNFoIuq4MDeZNIYuwe/rweivtHGroM3x98FRTF RD2fV7EJBoS70u7wUR0dBKam5/B7VEzyyF2M8/vsplRvhFSU59jGKvXo6aK2LxZDKzZT65BgX1EmO SJDx0NRR1aXlsG8bgesE4+QBS8/+Nr/BYDMO2gPBh9TRKADGKKpap+xqNNmYfvgj5BWfhju/FPydz YxLQzzDg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nEWTa-009sqV-3i; Mon, 31 Jan 2022 13:12:10 +0000 Date: Mon, 31 Jan 2022 13:12:10 +0000 From: Matthew Wilcox To: NeilBrown Cc: Andrew Morton , Jeff Layton , Ilya Dryomov , Miklos Szeredi , Trond Myklebust , Anna Schumaker , linux-mm@kvack.org, linux-nfs@vger.kernel.org, linux-fsdevel@vger.kernel.org, ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH 1/3] fuse: remove reliance on bdi congestion Message-ID: References: <164360127045.4233.2606812444285122570.stgit@noble.brown> <164360183348.4233.761031466326833349.stgit@noble.brown> <164360446180.18996.6767388833611575467@noble.neil.brown.name> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <164360446180.18996.6767388833611575467@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, Jan 31, 2022 at 03:47:41PM +1100, NeilBrown wrote: > On Mon, 31 Jan 2022, Matthew Wilcox wrote: > > > +++ b/fs/fuse/file.c > > > @@ -958,6 +958,8 @@ static void fuse_readahead(struct readahead_control *rac) > > > > > > if (fuse_is_bad(inode)) > > > return; > > > + if (fc->num_background >= fc->congestion_threshold) > > > + return; > > > > This seems like a bad idea to me. If we don't even start reads on > > readahead pages, they'll get ->readpage called on them one at a time > > and the reading thread will block. It's going to lead to some nasty > > performance problems, exactly when you don't want them. Better to > > queue the reads internally and wait for congestion to ease before > > submitting the read. > > > > Isn't that exactly what happens now? page_cache_async_ra() sees that > inode_read_congested() returns true, so it doesn't start readahead. > ??? It's rather different. Imagine the readahead window has expanded to 256kB (64 pages). Today, we see congestion and don't do anything. That means we miss the async readahed opportunity, find a missing page and end up calling into page_cache_sync_ra(), by which time we may or may not be congested. If the inode_read_congested() in page_cache_async_ra() is removed and the patch above is added to replace it, we'll allocate those 64 pages and add them to the page cache. But then we'll return without starting IO. When we hit one of those !uptodate pages, we'll call ->readpage on it, but we won't do anything to the other 63 pages. So we'll go through a protracted slow period of sending 64 reads, one at a time, whether or not congestion has eased. Then we'll hit a missing page and proceed to the sync ra case as above. (I'm assuming this is a workload which does a linear scan and so readahead is actually effective)