Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp705435pxt; Thu, 5 Aug 2021 09:36:45 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyosj6JHQcic7G6ZYATBemZvPrkjuiKLbhlrmbcjV9UAyr6qHtxK6drGnIBYaPKW1281fcO X-Received: by 2002:a5d:8b17:: with SMTP id k23mr387513ion.17.1628181405564; Thu, 05 Aug 2021 09:36:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628181405; cv=none; d=google.com; s=arc-20160816; b=Tk28x5n5xD9gJnYgEwxXJLImek0G5I3GpO9gNOGHZB6Fc2O+FETyHttKsykwTPRnV5 a8eiEM/iUnH4HBJGD+TPfjVk8H1jGWA5IKgtU0m8lX5uZZCmdMmyWsMdUxr/O3+GF8Iz AjYQhzyUVNsAJesjLlB2m5sd8/Vj6g6pY3A+RBEX9fl4SFOTkc8bBrcqWRFwKOouIrFL RiWoYfgJwB632c30qDQGUBiIZLIt+9CvcQGd681SFb9N+MVs8aHAcf/TWhdGy4/479i/ lt9NChRFGYfIMERVVmGw0bW9g6cPp2iUol1Sgw41MwVc/TnzmbvDf8IR3lyUgs+X99QC LK/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:content-id:mime-version :references:subject:cc:to:from:organization:dkim-signature; bh=cqiBhFIK0kAqQNHfo/GucN8GfiLKs6hRFMJo5oTIZHg=; b=uMyVBTu/JXWOdfOBwtu4bHrLq7+BKy8IhKCVoB1daLy2TMP5irpDTeOjp2LA0Crtsu LOdWaciaADLGeKm8onbBsdTdo+2kU2poop9wclitvHEsv88cW8uBaWERGxdlght6AzSf uJSBHpvt8B6ZExwqwiEtHmQg0g1vHzgOsHFvtH3Ba+yIjI2AvplRJ8tCMspbMOHyJsFv GbA78z3689osDZgjIuYpRtfWBzweLy1lXgsxqewg9qoe9F2pSffpsWXisXeSrsV50qhG pTE1HeUU81rXnfR4rCSpNzx5dD0YE3wIue3riVQKZwF8J6jDtof99/ZloVsvDrmPiKtk hw8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Zo7rgPPA; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id q7si6360559ilu.111.2021.08.05.09.36.21; Thu, 05 Aug 2021 09:36:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Zo7rgPPA; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231815AbhHEQgM (ORCPT + 99 others); Thu, 5 Aug 2021 12:36:12 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:55517 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232361AbhHEQgL (ORCPT ); Thu, 5 Aug 2021 12:36:11 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628181353; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=cqiBhFIK0kAqQNHfo/GucN8GfiLKs6hRFMJo5oTIZHg=; b=Zo7rgPPA6Kp1XzQG+zcuXeVE4nQug89E3TmBGYlB0WESl1BWOjOKU822qkzJAvfed8oXz7 j/LujaOun+dzqENHx3NXupAHxhgs63JgtPAGdhyqTGCLWjzOyU6+RN1Rr6a6bVxUGS2kTK xG2sl4PaDB3QYCM3IbXeWTJ5E0CoAxc= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-420-OwRpS98qMZeNyIhOZrpXfg-1; Thu, 05 Aug 2021 12:35:47 -0400 X-MC-Unique: OwRpS98qMZeNyIhOZrpXfg-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id C19FFCC623; Thu, 5 Aug 2021 16:35:38 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.22.32.7]) by smtp.corp.redhat.com (Postfix) with ESMTP id 1A1BA5C1A1; Thu, 5 Aug 2021 16:35:33 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells To: Anna Schumaker , Trond Myklebust , Jeff Layton , Steve French , Dominique Martinet , Mike Marshall , Miklos Szeredi Cc: dhowells@redhat.com, "Matthew Wilcox (Oracle)" , Shyam Prasad N , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Canvassing for network filesystem write size vs page size References: <1017390.1628158757@warthog.procyon.org.uk> <1170464.1628168823@warthog.procyon.org.uk> <1186271.1628174281@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <1219712.1628181333.1@warthog.procyon.org.uk> Date: Thu, 05 Aug 2021 17:35:33 +0100 Message-ID: <1219713.1628181333@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org With Willy's upcoming folio changes, from a filesystem point of view, we're going to be looking at folios instead of pages, where: - a folio is a contiguous collection of pages; - each page in the folio might be standard PAGE_SIZE page (4K or 64K, say) or a huge pages (say 2M each); - a folio has one dirty flag and one writeback flag that applies to all constituent pages; - a complete folio currently is limited to PMD_SIZE or order 8, but could theoretically go up to about 2GiB before various integer fields have to be modified (not to mention the memory allocator). Willy is arguing that network filesystems should, except in certain very special situations (eg. O_SYNC), only write whole folios (limited to EOF). Some network filesystems, however, currently keep track of which byte ranges are modified within a dirty page (AFS does; NFS seems to also) and only write out the modified data. Also, there are limits to the maximum RPC payload sizes, so writing back large pages may necessitate multiple writes, possibly to multiple servers. What I'm trying to do is collate each network filesystem's properties (I'm including FUSE in that). So we have the following filesystems: Plan9 - Doesn't track bytes - Only writes single pages AFS - Max RPC payload theoretically ~5.5 TiB (OpenAFS), ~16EiB (Auristor/kAFS) - kAFS (Linux kernel) - Tracks bytes, only writes back what changed - Writes from up to 65535 contiguous pages. - OpenAFS/Auristor (UNIX/Linux) - Deal with cache-sized blocks (configurable, but something from 8K to 2M), reads and writes in these blocks - OpenAFS/Auristor (Windows) - Track bytes, write back only what changed Ceph - File divided into objects (typically 2MiB in size), which may be scattered over multiple servers. - Max RPC size is therefore object size. - Doesn't track bytes. CIFS/SMB - Writes back just changed bytes immediately under some circumstances - Doesn't track bytes and writes back whole pages otherwise. - SMB3 has a max RPC size of 16MiB, with a default of 4MiB FUSE - Doesn't track bytes. - Max 'RPC' size of 256 pages (I think). NFS - Tracks modified bytes within a page. - Max RPC size of 1MiB. - Files may be constructed of objects scattered over different servers. OrangeFS - Doesn't track bytes. - Multipage writes possible. If you could help me fill in the gaps, that would be great. Thanks, David