Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp84217pxt; Wed, 11 Aug 2021 15:14:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy2yjGHTIEAWsFCK/eM09+saKp4fZ363lh7YjVfQsafbp2i+lPSP6eYmSWVIY1eS43Ybpco X-Received: by 2002:aa7:c3d0:: with SMTP id l16mr1461740edr.122.1628720049234; Wed, 11 Aug 2021 15:14:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628720049; cv=none; d=google.com; s=arc-20160816; b=DW8l7v336PWxWG9Ul5HK3O5/gEctJ8txN7Aul9ExjeKA5ghJ1hPP5QreDnJYtNP6Mz ll82Ko/2GTqKja91JZDfIQAIcvkNfHSmad2TDNldPpK0z5EVBcsdR5bDkHtdEx9jbOoD MaTredV9LGsBT2LyHB2P3rZ9SPdn9uCtoh4TxDOb+KqHr9Vr8yjukIpSJjPMC8ghePOo dp8tzda+2fhcQW3ma+IxcgH+GEmKpyEhuBInstdFeSFfCNnvJ4GLuMn7B8qsXSoGD1qG fNcP7e4gRVFgfeBrL4EsjGJNR/SzIuN4t0c3G65F56PCtO+HdEtuq/naoln4CaBAtLLf nAwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=FeGzUTvoCN8Wlt5hyL5qJ6Nm2upHl5bZr9FlYzW5TQ0=; b=gXHLmya/iIc1rJRitMTrd3Xy6tmArpOSSOVmLa5JWcwVO8hY7BbUoEnj4yWnEACKrn xpbtt2zrXP+uvUO3cpXGy4jzoCZl1H4tLMAsq6id1QDdY9fjcgCJosaRo0WT0ncABJNL 5UHh8m3uNkBSbpZEj1oj83a9qCmH4fJ+CcXSIzrytjcPCHZEogEViqjHdMZ5y0g/KGMd K2ZqkamyVexNcNgjq2rRtvBGijU2UbFYwGDqGOs86w2YTyqfgiMKPQhJksT2crIEJdxz yXGEFhjouOEVsxM+1DxjyMULjqI4cHNReu1JmZGU59qkqpyX0Wx2WUDIv65C8HCzW72q DXAw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=yIgC1og8; dkim=neutral (no key) header.i=@suse.de header.b=8W2baWzF; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id z16si879951edm.47.2021.08.11.15.13.33; Wed, 11 Aug 2021 15:14:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=yIgC1og8; dkim=neutral (no key) header.i=@suse.de header.b=8W2baWzF; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232353AbhHKWNy (ORCPT + 99 others); Wed, 11 Aug 2021 18:13:54 -0400 Received: from smtp-out2.suse.de ([195.135.220.29]:55910 "EHLO smtp-out2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232226AbhHKWNy (ORCPT ); Wed, 11 Aug 2021 18:13:54 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id C89C71FEF2; Wed, 11 Aug 2021 22:13:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1628720008; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FeGzUTvoCN8Wlt5hyL5qJ6Nm2upHl5bZr9FlYzW5TQ0=; b=yIgC1og83IS8ikDuUa+qhuKWoRiaFZpJQmCJxxkDPM8DsMrrvq5GROc6puqtu3oDd3jOTq DwVI/+GcUIDgt2Jfxdv8k904Utq59utZDQWjNG+amg5vxH2zsmMUVfk35LCfJyJFB8PEbU eIDLG5MRhGprNTj4px4r0NpIl3c+TQ4= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1628720008; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FeGzUTvoCN8Wlt5hyL5qJ6Nm2upHl5bZr9FlYzW5TQ0=; b=8W2baWzFIFgDvTP8YHv8VfmQD59nHIEjVMGlpp67VhTsoI5tbZnt9ZATQRtUnjXKW4mSse tDnOxeGJjByd16DA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id EBC4A13AE6; Wed, 11 Aug 2021 22:13:26 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id C9QcKoZLFGHxKwAAMHmgww (envelope-from ); Wed, 11 Aug 2021 22:13:26 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 From: "NeilBrown" To: "Josef Bacik" Cc: "Chris Mason" , "David Sterba" , linux-fsdevel@vger.kernel.org, "Linux NFS list" , "Btrfs BTRFS" Subject: Re: [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness. In-reply-to: References: <162848123483.25823.15844774651164477866.stgit@noble.brown>, Date: Thu, 12 Aug 2021 08:13:23 +1000 Message-id: <162872000356.22261.854151210687377005@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, 11 Aug 2021, Josef Bacik wrote: >=20 > I think this is a step in the right direction, but I want to figure out a w= ay to=20 > accomplish this without magical mount points that users must be aware of. magic mount *options* ??? >=20 > I think the stat() st_dev ship as sailed, we're stuck with that. However=20 > Christoph does have a valid point where it breaks the various info spit out= by=20 > /proc. You've done a good job with the treeid here, but it still makes it = > impossible for somebody to map the st_dev back to the correct mount. The ship might have sailed, but it is not water tight. And as the world it round, it can still come back to bite us from behind. Anything can be transitioned away from, whether it is devfs or 32-bit time or giving different device numbers to different file-trees. The linkage between device number and and filesystem is quite strong. We could modified all of /proc and /sys/ and audit and whatever else to report the fake device number, but we cannot get the fake device number into the mount table (without making the mount table unmanageablely large). =20 And if subtrees aren't in the mount-table for the NFS server, I don't think they should be in the mount-table of the NFS client. So we cannot export them to NFS. I understand your dislike for mount options. An alternative with different costs and benefits would be to introduce a new filesystem type - btrfs2 or maybe betrfs. This would provide numdevs=3D1 semantics and do whatever we decided was best with inode numbers. How much would you hate that? >=20 > I think we aren't going to solve that problem, at least not with stat(). I= =20 > think with statx() spitting out treeid we have given userspace a way to=20 > differentiate subvolumes, and so we should fix statx() to spit out the the = super=20 > block device, that way new userspace things can do their appropriate lookup= if=20 > they so choose. I don't think we should normalize having multiple devnums per filesystem by encoding it in statx(). It *would* make sense to add a btrfs ioctl which reports the real device number of a file. Tools that really need to work with btrfs could use that, but it would always be obvious that it was an exception. >=20 > This leaves the problem of nfsd. Can you just integrate this new treeid in= to=20 > nfsd, and use that to either change the ino within nfsd itself, or do somet= hing=20 > similar to what your first patchset did and generate a fsid based on the tr= eeid? I would only want nfsd to change the inode number. I no longer think it is acceptable for nfsd to report different device number (as I mention above). I would want the new inode number to be explicitly provided by the filesystem. Whether that is a new export_operation or a new field in 'struct kstat' doesn't really bother me. I'd *prefer* it to be st_ino, but I can live without that. On the topic of inode numbers.... I've recently learned that btrfs never reuses inode (objectid) numbers (except possibly after an unmount). Equally it doesn't re-use subvol numbers. How much does this contribute to the 64 bits not being enough for subtree+inode? It would be nice if we could be comfortable limiting the objectid number to 40 bits and the root.objectid (filetree) number to 24 bits, and combine them into a 64bit inode number. If we added a inode number reuse scheme that was suitably performant, would that make this possible? That would remove the need for a treeid, and allow us to use project-id to identify subtrees. >=20 > Mount options are messy, and are just going to lead to distro's turning the= m on=20 > without understanding what's going on and then we have to support them fore= ver.=20 > I want to get this fixed in a way that we all hate the least with as litt= le=20 > opportunity for confused users to make bad decisions. Thanks, Hence my question: how much do you hate creating a new filesystem type to fix the problems? Thanks, NeilBrown