Received: by 2002:a05:6358:53a8:b0:117:f937:c515 with SMTP id z40csp2039512rwe; Sat, 15 Apr 2023 10:14:49 -0700 (PDT) X-Google-Smtp-Source: AKy350asM22DjNWh7w0R+n5VdfM7CZAw2/7Q6yTmDJQ5zIHNUTSzEYE6z/vjNYz2Ard780PseJUs X-Received: by 2002:a05:6a00:24cc:b0:63b:8778:99e4 with SMTP id d12-20020a056a0024cc00b0063b877899e4mr936651pfv.2.1681578889104; Sat, 15 Apr 2023 10:14:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681578889; cv=none; d=google.com; s=arc-20160816; b=ZVJKlXWsMuwNA96bNNDgd/X1NF+OIV11PFroHoeCo9VAwsrlaDhMB9Y2KHXYKvnM0l BkkB3C48b9KjfSnpijRm4i5+ecZfuyXkp08GzWRVDUoDDDYQyf8H24BqH/nMOL7Tc59v eC5tHSXCinrRj6VLHiwZv+Q+0Yfqw0MnV9mh3/SkwSeM8/UOYz7UOIiF3Zzekk038JuK GOViI4xzYZrBrLkHRLzpIxiGZQU4v1aKbjmVL20UE7AEvMNkIC9QCPBk2D8/UT490F98 7in9s1UewqWvb4jdBVA5AqpVJh0TbjCv+cYn5WTkpl4kotvuOzCb28WJCveLP2c09M5U EkuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=JEzQCp5ZjGSyHtlxXBCHODnkPIimm3pw3UxVl6Zb3d4=; b=vM1gybXv7KdyXLBXrszxQDTsM5Bcx1/YhuinNY3PogXKyBYAu3Pg7b7+XXrm+72RMC nhw+doAOO00IWMImbp5mh9bvt+Daw08BKDnKtOQuUGVOaWU8Qt3NTXuFToEnwCYP1ls2 k+j4m556N69ARAeagR3Nq1c0f0yfGaXPpmbXuO812i/kSPAQErvsFAjp7hA2SyZ6T7q7 ube7CixKQ7Jx4YChZhtDaMS0H4Xxzd9bYUtCcsCEF5iT+3AGRonfyOm/DSu6PRN1IXXR Xejri8UivtlrbY8EBsRpCibYB00kh9kMr4t8SVB4m4GbyevoufXd+cuV+OkGliz1kASn SDvQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="gM8VoT/D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a6-20020a631a46000000b0051b25ff72a9si7869648pgm.889.2023.04.15.10.14.23; Sat, 15 Apr 2023 10:14:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b="gM8VoT/D"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230030AbjDORKG (ORCPT + 99 others); Sat, 15 Apr 2023 13:10:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229760AbjDORKE (ORCPT ); Sat, 15 Apr 2023 13:10:04 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC69230C7; Sat, 15 Apr 2023 10:09:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=JEzQCp5ZjGSyHtlxXBCHODnkPIimm3pw3UxVl6Zb3d4=; b=gM8VoT/DE6fH4z/cs3EoSa0IC3 sKcc8ENS2ZUOAMLvB6QfLTkIMWqMGdtNutLQRjAAYCt8KjB1afgYrAEvjZZC50LIKzPtUN8vOFYhr nj9v4rj0fo2lGLSkrPY8N28bB7mKctO04LI8/EJ5VZFJSM6XPGw7xJYvYvbC3it1k5Ovw+teN0wcN rCTSzSVku04TFFSu2GZN3/LZnxdDdby6JBDMbla7jfZkIh4Go3qMoXcVs9fx4iFXAAQV9XgODCZ5G RzPajiNYFLZ6piYUXfxRzYcmjaH1X7G30D/wOsnQNEhmqPyPMQfCZZ5/voWu/usIV+R+WnxUHKGGJ JhEs2J3w==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pnjOi-009kRr-MG; Sat, 15 Apr 2023 17:09:12 +0000 Date: Sat, 15 Apr 2023 18:09:12 +0100 From: Matthew Wilcox To: Hannes Reinecke Cc: Luis Chamberlain , Pankaj Raghav , brauner@kernel.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, gost.dev@samsung.com Subject: Re: [RFC 0/4] convert create_page_buffers to create_folio_buffers Message-ID: References: <20230414110821.21548-1-p.raghav@samsung.com> <1e68a118-d177-a218-5139-c8f13793dbbf@suse.de> <31765c8c-e895-4207-2b8c-39f6c7c83ece@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <31765c8c-e895-4207-2b8c-39f6c7c83ece@suse.de> X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_NONE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sat, Apr 15, 2023 at 03:14:33PM +0200, Hannes Reinecke wrote: > On 4/15/23 05:44, Matthew Wilcox wrote: > > I do wonder how much it's worth doing this vs switching to non-BH methods. > > I appreciate that's a lot of work still. > > That's what I've been wondering, too. > > I would _vastly_ prefer to switch over to iomap; however, the blasted > sb_bread() is getting in the way. Currently iomap only runs on entire > pages / folios, but a lot of (older) filesystems insist on doing 512 Hang on, no, iomap can issue sub-page reads. eg iomap_read_folio_sync() will read the parts of the folio which have not yet been read when called from __iomap_write_begin(). > byte I/O. While this seem logical (seeing that 512 bytes is the > default, and, in most cases, the only supported sector size) question > is whether _we_ from the linux side need to do that. > We _could_ upgrade to always do full page I/O; there's a good > chance we'll be using the entire page anyway eventually. > And with storage bandwidth getting larger and larger we might even > get a performance boost there. I think we need to look at this from the filesystem side. What do filesystems actually want to do? The first thing is they want to read the superblock. That's either going to be immediately freed ("Oh, this isn't a JFS filesystem after all") or it's going to hang around indefinitely. There's no particular need to keep it in any kind of cache (buffer or page). Except ... we want to probe a dozen different filesystems, and half of them keep their superblock at the same offset from the start of the block device. So we do want to keep it cached. That's arguing for using the page cache, at least to read it. Now, do we want userspace to be able to dd a new superblock into place and have the mounted filesystem see it? I suspect that confuses just about every filesystem out there. So I think the right answer is to read the page into the bdev's page cache and then copy it into a kmalloc'ed buffer which the filesystem is then responsible for freeing. It's also responsible for writing it back (so that's another API we need), and for a journalled filesystem, it needs to fit into the journalling scheme. Also, we may need to write back multiple copies of the superblock, possibly with slight modifications. There are a lot of considerations here, and I don't feel like I have enough of an appreciation of filesystem needs to come up with a decent API. I'd hope we can get a good discussion going at LSFMM.