Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp6710720pxv; Thu, 29 Jul 2021 23:54:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwYOBRFccoIvzy0oRPjuY1jOnPP5O3qEXCiDUEL5G1tYGCtthDtImkcJMM6f567RGd/gQj5 X-Received: by 2002:a5d:93d3:: with SMTP id j19mr1233217ioo.184.1627628064744; Thu, 29 Jul 2021 23:54:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627628064; cv=none; d=google.com; s=arc-20160816; b=Ij3/vPHC88v1uubkFRCD4DUEwYsjcWI6tGYWsTFWis/aUGBfr8znFLBdf0tytLDybU zEIUXXRdpfY9R10LM2lqN4cyHwgZPGbRRCFUKU5IEJhYExg/uLh+PQSAVSJsjSYqdTP3 vFTxjjr23zEGQ6meR947GuBJ6TwZcm/vgDvzwwHSoe+unupr5AOLTMjD1dXkgP83G7wc u2e8DdgMeHVBLpKQfBmmFcEi1CYmLmjIjS11tShjHvp2fMNkJm0615bSNmvBvhDiSkfg AX95SHlx6DmdlCm0z5XPpA31JYXLzQ1W44ENahANMgTV8pRmQbtH6hcttmalm5k7CMUr jOLQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=O0VNkysLskS0vnZtSs3qO15oG+ZfZSStWcowuBEvjK0=; b=khBp7clMeY1dcP/veXqbKNvig9dFcAMocCY11eg/BzgGLS+kWdR9TuRKEy+Z2IU1Zz Uvq/yXfMIyJjaRaqt/xrWKYx2F3Al/k7o8uWF29W+2WC3jooeIDzZV1cUILyGemjwokj fBfHWVKeECEd+l+9ItI6iE0SHrAGVkbsi0n4vtsqXO+AtUiuP+HwCa+56sZANSsV5TiZ nHSVTWmr1mQ9fKmx4LrretvfL9ypIZb9BKVcPjtQ6Yj6sMx2LkiqkkajGTBZi2ePGD5x UvcJA1JVA58i8NZ7CF26ckVFwncWOXvkiyyKsoF+4n5OotHJ7EQ9dQqaGi1WbkKJb9qV BJWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=iQMSwnhi; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n11si926957ilt.138.2021.07.29.23.53.59; Thu, 29 Jul 2021 23:54:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=iQMSwnhi; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237278AbhG3Gx4 (ORCPT + 99 others); Fri, 30 Jul 2021 02:53:56 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:59522 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237556AbhG3Gx4 (ORCPT ); Fri, 30 Jul 2021 02:53:56 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id B8A4F22366; Fri, 30 Jul 2021 06:53:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1627628030; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O0VNkysLskS0vnZtSs3qO15oG+ZfZSStWcowuBEvjK0=; b=iQMSwnhihJwCoCZBEdV6IbqSoM5d74znAm0W70tiGcRRKKuODaPpef0bzHyDA6ei6YuYLW ZFO5+/hGt8TLtlcZ6othbK0CZagNoh7oHNx2nlsHXRmFuzANVCSbGl0oMUO1bo73/PBFaj m0uNeZ9uTPRMuxptfjrEdQpCTGSnIzQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1627628030; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O0VNkysLskS0vnZtSs3qO15oG+ZfZSStWcowuBEvjK0=; b=jLH0hAM1ut/uV7ynp2lkHt/4r19cZbXU03Cj/kSid1O5Ni0+OBLxJw1yy6pnR6jzCzcLUz EWGS6WGY2a2FS2AQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C543113BFD; Fri, 30 Jul 2021 06:53:46 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id Wn1wIPqhA2G5DgAAMHmgww (envelope-from ); Fri, 30 Jul 2021 06:53:46 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 From: "NeilBrown" To: "Qu Wenruo" Cc: "Zygo Blaxell" , "Neal Gompa" , "Wang Yugui" , "Christoph Hellwig" , "Josef Bacik" , "J. Bruce Fields" , "Chuck Lever" , "Chris Mason" , "David Sterba" , "Alexander Viro" , "linux-fsdevel" , linux-nfs@vger.kernel.org, "Btrfs BTRFS" Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly In-reply-to: References: <162742539595.32498.13687924366155737575.stgit@noble.brown>, <20210728125819.6E52.409509F4@e16-tech.com>, <20210728140431.D704.409509F4@e16-tech.com>, <162745567084.21659.16797059962461187633@noble.neil.brown.name>, , <162751265073.21659.11050133384025400064@noble.neil.brown.name>, <20210729023751.GL10170@hungrycats.org>, <162752976632.21659.9573422052804077340@noble.neil.brown.name>, <20210729232017.GE10106@hungrycats.org>, <162761259105.21659.4838403432058511846@noble.neil.brown.name>, <341403c0-a7a7-f6c8-5ef6-2d966b1907a8@gmx.com>, <162762468711.21659.161298577376336564@noble.neil.brown.name>, Date: Fri, 30 Jul 2021 16:53:43 +1000 Message-id: <162762802395.21659.5310176078177217626@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, 30 Jul 2021, Qu Wenruo wrote: > > > > You mean like "du -x"?? Yes. You would lose the misleading illusion > > that there are multiple filesystems. That is one user-expectation that > > would need to be addressed before people opt-in > > OK, forgot it's an opt-in feature, then it's less an impact. The hope would have to be that everyone would eventually opt-in once all issues were understood. > > Really not familiar with NFS/VFS, thus some ideas from me may sounds > super crazy. > > Is it possible that, for nfsd to detect such "subvolume" concept by its > own, like checking st_dev and the fsid returned from statfs(). > > Then if nfsd find some boundary which has different st_dev, but the same > fsid as its parent, then it knows it's a "subvolume"-like concept. > > Then do some local inode number mapping inside nfsd? > Like use the highest 20 bits for different subvolumes, while the > remaining 44 bits for real inode numbers. > > Of-course, this is still a workaround... Yes, it would certainly be possible to add some hacks to nfsd to fix the immediate problem, and we could probably even created some well-defined interfaces into btrfs to extract the required information so that it wasn't too hackish. Maybe that is what we will have to do. But I'd rather not hack NFSD while there is any chance that a more complete solution will be found. I'm not quite ready to give up on the idea of squeezing all btrfs inodes into a 64bit number space. 24bits of subvol and 40 bits of inode? Make the split a mkfs or mount option? Maybe hand out inode numbers to subvols in 2^32 chunks so each subvol (which has ever been accessed) has a mapping from the top 32 bits of the objectid to the top 32 bits of the inode number. We don't need something that is theoretically perfect (that's not possible anyway as we don't have 64bits of device numbers). We just need something that is practical and scales adequately. If you have petabytes of storage, it is reasonable to spend a gigabyte of memory on a lookup table(?). If we can make inode numbers unique, we can possibly leave the st_dev changing at subvols so that "du -x" works as currently expected. One thought I had was to use a strong hash to combine the subvol object id and the inode object id into a 64bit number. What is the chance of a collision in practice :-) Thanks, NeilBrown