Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp867023pxt; Thu, 5 Aug 2021 13:47:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwiZ1piClNu4gWhlbgF7+2eg9hW1kI0NicXUOIgf82+3vHw+s9EhLs+dx0dPFiO4XlAMaKd X-Received: by 2002:a05:6638:39c2:: with SMTP id o2mr6226982jav.87.1628196444623; Thu, 05 Aug 2021 13:47:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628196444; cv=none; d=google.com; s=arc-20160816; b=nM5AKj8VeShgoV75ckGuS6f7JFkXAbxdGizbDyiHWf5J0dRzxngUkjO7jxKsN41aLa P/E3cRJutgJW1fwb/YOQkNflUahf01nwxQI/6E3hJRhAjOmtQtOLQAAw1ZfomWmaLci5 +ee+3EWjvDOclWifF2MUKfwRj8Qmrob1dM/1fyY0PjcWJYIm45KOUrj/y8Dk5J+Vi2Vh IPZE5MAOIo8Rs6bCwXK0yRnC4oUW/qLq29jthA0Y40tGqoetwkfqYyd3gvkxkXEfgbR2 EpwEi76XgrRLv7jqqHtDZ+FY5KQmAkOJ0+zUVtAl/D6lMwvxHgVtp+O9HwQjmjENORET jL1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id:dkim-signature; bh=IiYjOVol2JvYBzSm9nmbOsFi1gSlVR+sNt5v+Iro+eM=; b=b3eJOKV5pK0aRkqKBSc8f0R4Boaos6HF5otgDqVSr0eVq+CjKjVvVgRqsT5ENmoy0u WvRWjbayKszARQTfpz6p7U52wNuehNWGphtgxXSGxmHfRMkggC6SeDNM1ttUiWlWjloE Jl9JaMeHMMzQPF61u8DozHLtLilMP9FEc3KagbYnp2IXnH74nsjsTRTcZq6O2HCppa09 ItmGhmJpfYGDcgkMwSpRCLM4cdH1D0VmFsAGyIDG92FY29QZ4OFI6PgICx/qQMVAN3qq BFHrMWpkop1FYRGRw7x/CcVadXR8HWTtsvMZXHJxHx2IKZziGh2lQMs3/bYMTdzws3nS 40ag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZHbbAU61; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e9si5884615ilu.157.2021.08.05.13.46.59; Thu, 05 Aug 2021 13:47:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZHbbAU61; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241773AbhHESuu (ORCPT + 99 others); Thu, 5 Aug 2021 14:50:50 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]:49518 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241748AbhHESut (ORCPT ); Thu, 5 Aug 2021 14:50:49 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1628189434; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=IiYjOVol2JvYBzSm9nmbOsFi1gSlVR+sNt5v+Iro+eM=; b=ZHbbAU61oS1H+2oKLhh8moglBhhHg9ftrLE4x06oC+5N59lgrVeSpWNOFolj5VDfdjOBjg JHdkv0sLAG+RU8NV0HI3je+fFAMDQabsFcLI6ODWV8Z0zBC50uzg2lkHxV5LkcU81uU/9x ldeg5zLEeBhlU26u5Fl6xXD+FK36/ZQ= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-390-7ks9ow0ENze5U0XmDm5GyA-1; Thu, 05 Aug 2021 14:50:33 -0400 X-MC-Unique: 7ks9ow0ENze5U0XmDm5GyA-1 Received: by mail-qk1-f198.google.com with SMTP id p123-20020a378d810000b02903ad5730c883so4728088qkd.22 for ; Thu, 05 Aug 2021 11:50:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=IiYjOVol2JvYBzSm9nmbOsFi1gSlVR+sNt5v+Iro+eM=; b=Oua7cGbCvIlTiMWir63eB8UgZAT2/U8+s8fuK92WJ33KqfiJbW1aUBWk8CGVTJoH4q vngnWLt8QUBhpokalHEa1ih5yQUZ4XuGgkqcBsS8NLDbT7UyIz/K3YhBmd7ak2eY7aFV g2yeyDNzbWm2fl3cd+lAqBMsCXYk5R++uxq5Yd9FNNM8aBaH0Ch2xhaQJdCwtVTaG47Q vUHn6r4Q/zfSOBeXktdU14ZMDrFztynX5caRDueOF33+FuwZM189BoVznJuR/R/QSZ/d +pqQwYFbOnQanSsKsdY7s+OaFWXviEJOZICSzwJxikRKHKkt3LYj0pY1gJjnH0wWYN+E XHoA== X-Gm-Message-State: AOAM533JydsyJDz5D0hCVZ19ONRZ4BrHvC3GTUrgvKEz+mLHRrMfDEkS EgoXPcV62qqw14ssLwoLV3n+sPk8PqfcaHOHSYHEMYIsGHF4aCYlz2N9GOE4mz+htMwhcrKO7Fz ysOFWe9yIiOc0qO2dVaoU X-Received: by 2002:a05:620a:b44:: with SMTP id x4mr4723776qkg.11.1628189432779; Thu, 05 Aug 2021 11:50:32 -0700 (PDT) X-Received: by 2002:a05:620a:b44:: with SMTP id x4mr4723755qkg.11.1628189432545; Thu, 05 Aug 2021 11:50:32 -0700 (PDT) Received: from [192.168.1.3] (68-20-15-154.lightspeed.rlghnc.sbcglobal.net. [68.20.15.154]) by smtp.gmail.com with ESMTPSA id c190sm3509913qkg.46.2021.08.05.11.50.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 Aug 2021 11:50:32 -0700 (PDT) Message-ID: <90a2a17aeae0447793496426d21794a3b0f7c197.camel@redhat.com> Subject: Re: Canvassing for network filesystem write size vs page size From: Jeff Layton To: David Howells , Anna Schumaker , Trond Myklebust , Steve French , Dominique Martinet , Mike Marshall , Miklos Szeredi Cc: "Matthew Wilcox (Oracle)" , Shyam Prasad N , Linus Torvalds , linux-cachefs@redhat.com, linux-afs@lists.infradead.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net, devel@lists.orangefs.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Date: Thu, 05 Aug 2021 14:50:30 -0400 In-Reply-To: <1219713.1628181333@warthog.procyon.org.uk> References: <1017390.1628158757@warthog.procyon.org.uk> <1170464.1628168823@warthog.procyon.org.uk> <1186271.1628174281@warthog.procyon.org.uk> <1219713.1628181333@warthog.procyon.org.uk> Content-Type: text/plain; charset="ISO-8859-15" User-Agent: Evolution 3.40.3 (3.40.3-1.fc34) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Thu, 2021-08-05 at 17:35 +0100, David Howells wrote: > With Willy's upcoming folio changes, from a filesystem point of view, we're > going to be looking at folios instead of pages, where: > > - a folio is a contiguous collection of pages; > > - each page in the folio might be standard PAGE_SIZE page (4K or 64K, say) or > a huge pages (say 2M each); > > - a folio has one dirty flag and one writeback flag that applies to all > constituent pages; > > - a complete folio currently is limited to PMD_SIZE or order 8, but could > theoretically go up to about 2GiB before various integer fields have to be > modified (not to mention the memory allocator). > > Willy is arguing that network filesystems should, except in certain very > special situations (eg. O_SYNC), only write whole folios (limited to EOF). > > Some network filesystems, however, currently keep track of which byte ranges > are modified within a dirty page (AFS does; NFS seems to also) and only write > out the modified data. > > Also, there are limits to the maximum RPC payload sizes, so writing back large > pages may necessitate multiple writes, possibly to multiple servers. > > What I'm trying to do is collate each network filesystem's properties (I'm > including FUSE in that). > > So we have the following filesystems: > > Plan9 > - Doesn't track bytes > - Only writes single pages > > AFS > - Max RPC payload theoretically ~5.5 TiB (OpenAFS), ~16EiB (Auristor/kAFS) > - kAFS (Linux kernel) > - Tracks bytes, only writes back what changed > - Writes from up to 65535 contiguous pages. > - OpenAFS/Auristor (UNIX/Linux) > - Deal with cache-sized blocks (configurable, but something from 8K to 2M), > reads and writes in these blocks > - OpenAFS/Auristor (Windows) > - Track bytes, write back only what changed > > Ceph > - File divided into objects (typically 2MiB in size), which may be scattered > over multiple servers. The default is 4M in modern cephfs clusters, but the rest is correct. > - Max RPC size is therefore object size. > - Doesn't track bytes. > > CIFS/SMB > - Writes back just changed bytes immediately under some circumstances cifs.ko can also just do writes to specific byte ranges synchronously when it doesn't have the ability to use the cache (i.e. no oplock or lease). CephFS also does this when it doesn't have the necessary capabilities (aka caps) to use the pagecache. If we want to add infrastructure for netfs writeback, then it would be nice to consider similar infrastructure to handle those cases as well. > - Doesn't track bytes and writes back whole pages otherwise. > - SMB3 has a max RPC size of 16MiB, with a default of 4MiB > > FUSE > - Doesn't track bytes. > - Max 'RPC' size of 256 pages (I think). > > NFS > - Tracks modified bytes within a page. > - Max RPC size of 1MiB. > - Files may be constructed of objects scattered over different servers. > > OrangeFS > - Doesn't track bytes. > - Multipage writes possible. > > If you could help me fill in the gaps, that would be great. -- Jeff Layton