Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp6683281pxv; Thu, 29 Jul 2021 22:59:05 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyQ/ki2bgaAOvT6a/8EZHzbUlNLQ6nhAb3Nsv+lHqWDpPLN5bdHNDaQ1smPCRFElFGRk2i4 X-Received: by 2002:a17:906:5283:: with SMTP id c3mr1064563ejm.458.1627624745359; Thu, 29 Jul 2021 22:59:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627624745; cv=none; d=google.com; s=arc-20160816; b=hndd7Xjr4+K41VRMXrYPxmD/x5+56aNU9KYUG/tDwZO4Sb8LqGKeMVoZA8TRLOCCDp wPaVU+F38yG3EZaZ+mZx+S/bSBErfi2CEfnGHc1i/vvFraMKU/N7QhCOvp/kdSpeedve ntW72512E1Gl8PrjC68IUWpREXesD5ZXyAt8XdSJMWkjv2TU79Fu0tksOV6UnU1ILDJ8 xfE3MvGt2LUWaYPO3Mk1o3PPOUdetTCG2uS3VC5w6dHy+DYtbmtYnQa3lm67U2/1iis3 eN9TEx0ahl2U09dRy1Oa8i9zZPRAUoM9XQ433fF7X3369d3NnFwFzFpcq5fTp37Y3Rpa /LZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=HTRH/OYio0D4QPjATbqxXdI9yfVMbySsNkg6tU+Cye8=; b=h17zmccB60vlUw5Nt0nKXQXSkgSipZx/TmIWmg36IX3LLAcid7lRGu42xttfh0t77d M5GXcqb+ALgTWoNY4PRF2QyT41r/7WgIJjMJXXO9I0tWzpjbRERIoUuyw5I5eOZoxgu2 C1eZ2UkSddx2YBS/Ej2BIuRQUx2GsijZcgViclJeLWU7/tvomgjQvR/8ewuZigIfdbhz LRsc2PJgtHbExLTwf243dEiiiCZ+f9lCZwwcmKMbjPSQLiCEGnbfLtV1D0mmNAhGWmlu DTA00oFCwER5TvKn0T1N7vcfEHLUAZ91wUmtTf+ZDcSXO1HRpxj8TdNcDyWoAvU5nPqC oDUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=EV650qU5; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id ds14si924167ejc.620.2021.07.29.22.58.37; Thu, 29 Jul 2021 22:59:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=EV650qU5; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233002AbhG3F6V (ORCPT + 99 others); Fri, 30 Jul 2021 01:58:21 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:51830 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237035AbhG3F6T (ORCPT ); Fri, 30 Jul 2021 01:58:19 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 100022238B; Fri, 30 Jul 2021 05:58:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1627624694; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HTRH/OYio0D4QPjATbqxXdI9yfVMbySsNkg6tU+Cye8=; b=EV650qU5HyFBDOyYdmgsDF1mEj2Pj7Fb4hLrAi6uCsyc0YvOuZ9VpSh3gHCMQN8V1ExJP6 gpjgTf3725oaO9xAvyypXwvNCIWJoHf6iK10+Wib2HoqHB5wbsqABGq/JXI2X3+dIzlpQF 3knXPWUg44bw4JI3Y/nDsQoJ7DKrtA0= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1627624694; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=HTRH/OYio0D4QPjATbqxXdI9yfVMbySsNkg6tU+Cye8=; b=AO+FcYH1ftcTRQ+U4RqzRpwgi0JrZL0WmGrnBCPIxh+nFSvITYzG1/9UW4sjOX7naT+UtP QDbGBZf+azZUcJAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 1FAE813BF9; Fri, 30 Jul 2021 05:58:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id qfHVM/GUA2GqfwAAMHmgww (envelope-from ); Fri, 30 Jul 2021 05:58:09 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Qu Wenruo" Cc: "Zygo Blaxell" , "Neal Gompa" , "Wang Yugui" , "Christoph Hellwig" , "Josef Bacik" , "J. Bruce Fields" , "Chuck Lever" , "Chris Mason" , "David Sterba" , "Alexander Viro" , "linux-fsdevel" , linux-nfs@vger.kernel.org, "Btrfs BTRFS" Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly In-reply-to: <341403c0-a7a7-f6c8-5ef6-2d966b1907a8@gmx.com> References: <162742539595.32498.13687924366155737575.stgit@noble.brown>, <20210728125819.6E52.409509F4@e16-tech.com>, <20210728140431.D704.409509F4@e16-tech.com>, <162745567084.21659.16797059962461187633@noble.neil.brown.name>, , <162751265073.21659.11050133384025400064@noble.neil.brown.name>, <20210729023751.GL10170@hungrycats.org>, <162752976632.21659.9573422052804077340@noble.neil.brown.name>, <20210729232017.GE10106@hungrycats.org>, <162761259105.21659.4838403432058511846@noble.neil.brown.name>, <341403c0-a7a7-f6c8-5ef6-2d966b1907a8@gmx.com> Date: Fri, 30 Jul 2021 15:58:07 +1000 Message-id: <162762468711.21659.161298577376336564@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, 30 Jul 2021, Qu Wenruo wrote: >=20 > On 2021/7/30 =E4=B8=8A=E5=8D=8810:36, NeilBrown wrote: > > > > I've been pondering all the excellent feedback, and what I have learnt > > from examining the code in btrfs, and I have developed a different > > perspective. >=20 > Great! Some new developers into the btrfs realm! :-) >=20 > > > > Maybe "subvol" is a poor choice of name because it conjures up > > connections with the Volumes in LVM, and btrfs subvols are very different > > things. Btrfs subvols are really just subtrees that can be treated as a > > unit for operations like "clone" or "destroy". > > > > As such, they don't really deserve separate st_dev numbers. > > > > Maybe the different st_dev numbers were introduced as a "cheap" way to > > extend to size of the inode-number space. Like many "cheap" things, it > > has hidden costs. > > > > Maybe objects in different subvols should still be given different inode > > numbers. This would be problematic on 32bit systems, but much less so on > > 64bit systems. > > > > The patch below, which is just a proof-of-concept, changes btrfs to > > report a uniform st_dev, and different (64bit) st_ino in different subvol= s. > > > > It has problems: > > - it will break any 32bit readdir and 32bit stat. I don't know how big > > a problem that is these days (ino_t in the kernel is "unsigned long", > > not "unsigned long long). That surprised me). > > - It might break some user-space expectations. One thing I have learnt > > is not to make any assumption about what other people might expect. >=20 > Wouldn't any filesystem boundary check fail to stop at subvolume boundary? You mean like "du -x"?? Yes. You would lose the misleading illusion that there are multiple filesystems. That is one user-expectation that would need to be addressed before people opt-in >=20 > Then it will go through the full btrfs subvolumes/snapshots, which can > be super slow. >=20 > > > > However, it would be quite easy to make this opt-in (or opt-out) with a > > mount option, so that people who need the current inode numbers and will > > accept the current breakage can keep working. > > > > I think this approach would be a net-win for NFS export, whether BTRFS > > supports it directly or not. I might post a patch which modifies NFS to > > intuit improved inode numbers for btrfs exports.... >=20 > Some extra ideas, but not familiar with VFS enough to be sure. >=20 > Can we generate "fake" superblock for each subvolume? I don't see how that would help. Either subvols are like filesystems and appear in /proc/mounts, or they aren't like filesystems and don't get different st_dev. Either of these outcomes can be achieved without fake superblocks. If you really need BTRFS subvols to have some properties of filesystems but not all, then you are in for a whole world of pain. Maybe btrfs subvols should be treated more like XFS "managed trees". At least there you have precedent and someone else to share the pain. Maybe we should train people to use "quota" to check the usage of a subvol, rather than "du" (which will stop working with my patch if it contains refs to other subvols) or "df" (which already doesn't work), or "btrs df" > Like using the subolume UUID to replace the FSID of each subvolume. > Could that migrate the problem? Which problem, exactly? My first approach to making subvols work on NFS took essentially that approach. It was seen (quite reasonably) as a hack to work around poor behaviour in btrfs. Given that NFS has always seen all of a btrfs filesystem as have a uniform fsid, I'm now of the opinion that we don't want to change that, but should just fix the duplicate-inode-number problem. If I could think of some way for NFSD to see different inode numbers than VFS, I would push hard for fixs nfsd by giving it more sane inode numbers. Thanks, NeilBrown