Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp5720934pxv; Wed, 28 Jul 2021 18:39:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJztQXfIISUHjSKT92iH6G2tFQ96ESvq0Ppt1krv6mWQmq4/zNRmEoe4txhVPenN2NeZtOCk X-Received: by 2002:aa7:c7c2:: with SMTP id o2mr3244175eds.166.1627522771418; Wed, 28 Jul 2021 18:39:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627522771; cv=none; d=google.com; s=arc-20160816; b=VV/guV7A6Mbx1kmBRf96cKwc0PxbkYPzLFWhbk/BNbNc4p8Vtf2lZ+Qt2KTQXPZZXP SpFxuAZjcdlrEqpbOknLoXYpXFMtNT7NwjIbA9uh380z1u1Zhd8gHWOKeFSGfGQtQsIg e3c0AiGhXk0AtdhjZ0+hig5EzwdI+/SncEcp3J3KKLkAhu9lwZwVBksD6JONU6Z2PKXc noa/S+uWApaXw6+8Bgmc5blgOGz3I8PL9R1WAawHL5x6mcR0H+s5NJ0Mf/mkTPXZJCRM am3AJkCjkZ8qbCi43eafXivypreKkbktJYjAo1QatsD6AfM1lErvhwDyeiT+uSTwlPTM LjSQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=tXf9Klq49vxSs7wCAome2aS1xknoI2AjIjucVpGGJqg=; b=K59VglPAR6m4I2pF7gQFuPDbRIM/2jo1grw9uEC5iLEvgEY/kVZDxsdZymFyyE0NJd V8+RGKQ84iVQsYbG/9U/C/5tWBWxkqAIzi4Is++wyu/NxXMWgEQVJj82jNmAk9p+FVAZ Snczem5R/PvE0MKAYO0sWj5lnKZiy+aB4dDrVEzeXr0ve+kdYImtTdECgTR3P7As1/zW eOAML7Qj94/eY1m+fsSWiaYb+PrDyuo1DgxMcfPJj1Evj55PyIYg2LCKOWjrIZ3FYZJE aiT19doKk70qMZu7KXPesa+5nodMsYeZka2nFdVy+ewb5BuxIhvxcEgfYY9KOVnXlcOT N73Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id i18si1562511edc.507.2021.07.28.18.38.57; Wed, 28 Jul 2021 18:39:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233142AbhG2BhF (ORCPT + 99 others); Wed, 28 Jul 2021 21:37:05 -0400 Received: from james.kirk.hungrycats.org ([174.142.39.145]:44102 "EHLO james.kirk.hungrycats.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233140AbhG2BhA (ORCPT ); Wed, 28 Jul 2021 21:37:00 -0400 X-Greylist: delayed 442 seconds by postgrey-1.27 at vger.kernel.org; Wed, 28 Jul 2021 21:37:00 EDT Received: by james.kirk.hungrycats.org (Postfix, from userid 1002) id 9EA3FB0898C; Wed, 28 Jul 2021 21:29:31 -0400 (EDT) Date: Wed, 28 Jul 2021 21:29:31 -0400 From: Zygo Blaxell To: "J. Bruce Fields" Cc: Neal Gompa , NeilBrown , Wang Yugui , Christoph Hellwig , Josef Bacik , Chuck Lever , Chris Mason , David Sterba , Alexander Viro , linux-fsdevel , linux-nfs@vger.kernel.org, Btrfs BTRFS Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Message-ID: <20210729012931.GK10170@hungrycats.org> References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <20210728125819.6E52.409509F4@e16-tech.com> <20210728140431.D704.409509F4@e16-tech.com> <162745567084.21659.16797059962461187633@noble.neil.brown.name> <20210728191431.GA3152@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210728191431.GA3152@fieldses.org> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Jul 28, 2021 at 03:14:31PM -0400, J. Bruce Fields wrote: > On Wed, Jul 28, 2021 at 08:26:12AM -0400, Neal Gompa wrote: > > I think this is behavior people generally expect, but I wonder what > > the consequences of this would be with huge numbers of subvolumes. If > > there are hundreds or thousands of them (which is quite possible on > > SUSE systems, for example, with its auto-snapshotting regime), this > > would be a mess, wouldn't it? > > I'm surprised that btrfs is special here. Doesn't anyone have thousands > of lvm snapshots? Or is it that they do but they're not normally > mounted? Unprivileged users can't create lvm snapshots as easily or quickly as using mkdir (well, ok, mkdir and fssync). lvm doesn't scale very well past more than a few dozen snapshots of the same original volume, and performance degrades linearly in the number of snapshots if the original LV is modified. btrfs is the opposite: users can create and delete as many snapshots as they like, at a cost more expensive than mkdir but less expensive than 'cp -a', and users only pay IO costs for writes to the subvols they modify. So some btrfs users use snapshots in places where more traditional tools like 'cp -a' or 'git checkout' are used on other filesystems. e.g. a build system might make a snapshot of a git working tree containing a checked out and built baseline revision, and then it might do a loop where it makes a snapshot, applies one patch from an integration branch in the snapshot directory, and incrementally builds there. The next revision makes a snapshot of its parent revision's subvol and builds the next patch. If there are merges in the integration branch, then the builder can go back to parent revisions, create a new snapshot, apply the patch, and build in a snapshot on both sides of the merge. After testing picks a winner, the builder can simply delete all the snapshots except the one for the version that won testing (there is no requirement to commit the snapshot to the origin LV as in lvm, either can be destroyed without requiring action to preserve the other). You can do a similar thing with overlayfs, but it runs into problems with all the mount points. In btrfs, the mount points are persistent because they're built into the filesystem. With overlayfs, you have to save and restore them so they persist across reboots (unless that feature has been added since I last looked). I'm looking at a few machines here, and if all the subvols are visible to 'df', its output would be somewhere around 3-5 MB. That's too much--we'd have to hack up df to not show the same btrfs twice...as well as every monitoring tool that reports free space...which sounds similar to the problems we're trying to avoid. Ideally there would be a way to turn this on or off. It is creating a set of new problems that is the complement of the set we're trying to fix in this change. > --b.