Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp1937728pxy; Mon, 2 Aug 2021 14:26:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwwLLxZibdz9p8WG+LJyk31AzMO9OZcWSdldxFf0XdlhwmIebZ+pJFrb5ekRJXqh3nZK5z1 X-Received: by 2002:a17:906:cb96:: with SMTP id mf22mr12935660ejb.50.1627939562158; Mon, 02 Aug 2021 14:26:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627939562; cv=none; d=google.com; s=arc-20160816; b=A5F0a+Ll5PxpHIo0cfUb4nQrw5On1eT968pk9zmgh0aZaR1XprB0VHxfXhiJhMhCn9 XQaetedW5m9+yQsBZOLTJp/3DMeYKFWqRCBjIWDO9z6de+IHVxnYySFK9imnDoQ+8hUx BFBrDowuoFC4SnGBNQGm8LlKdNCCdeOV+8Yv5rbFagRWgjknx83SHN0emT1Ae8H3U69t 5EZwYaHpp3onKI+kudb7wEEhXLYbzSLnKGjuJIZyc/MY8OydfJibWjNXyuP2DOI5RrU7 PsJfagFVMw8korsE+OOePdGtsZA4ANNK3xe66wA6l3V9XpMf0OK43xbNrovXEsBbRdrV ykmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:message-id:date:references:in-reply-to:subject :cc:to:from:mime-version:content-transfer-encoding:dkim-signature :dkim-signature; bh=flTtb6/7w0HwUmbRvIEcTdDi58G1gakn33lHELatPJc=; b=ILfm7MJC+TOVFAy/JXQUkgd66CpdoZMO13xCkF5LpKExjtnpfdl940BN9I4qRpxidr ESR5jjXS8g6EArBBVFy7uRIgYsDmRMrV5RrQavVPttIiyae2EvFcy3WBZEaTxURqMicE h2orbNcBXrtd6BOLuIZzQBp/mq8D1X8r1HaYEIFswLd/JyJqqfk6Xl12cYn6Vd/XLf39 r93sElxdz5ZgQkjcB4HG4enkrSCKUgxxK46t4VNYNa6H9wuGf6SVY+WTTWRnjxcNTjIT 4MzX/71QtVnZyTsFOwylKX3+lkXTKbzS6idZUKYd+pIh26TaKZdGKmX+OQJwPRSFtjCg qRrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="Y/csoegb"; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g18si4048350eds.321.2021.08.02.14.25.11; Mon, 02 Aug 2021 14:26:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="Y/csoegb"; dkim=neutral (no key) header.i=@suse.de; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230313AbhHBVZR (ORCPT + 99 others); Mon, 2 Aug 2021 17:25:17 -0400 Received: from smtp-out1.suse.de ([195.135.220.28]:34646 "EHLO smtp-out1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229567AbhHBVZQ (ORCPT ); Mon, 2 Aug 2021 17:25:16 -0400 Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D996522046; Mon, 2 Aug 2021 21:25:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1627939504; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=flTtb6/7w0HwUmbRvIEcTdDi58G1gakn33lHELatPJc=; b=Y/csoegbCdDbzs4kYXj1/8GjSKxImjYIkGAnNH9jLI/OOm+Hz52ww88Cws8YKkDzB8xe4t ZCvkr3igTGZ9zXpeRlzh4CBIPgX5ZJ2ujDEz/QvngduiwFFzimLTZHEAs8W7ynBFSIlsaK AfpAD2mQyD63T3bFRsRfJjKt2Q9G4WA= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1627939504; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=flTtb6/7w0HwUmbRvIEcTdDi58G1gakn33lHELatPJc=; b=RFr0CLU/d1EG/iQETp3HG0+xER+pjE55JNAk/GC4T2Ank1QL0jecs4gRxAmCK1eDihmM21 oZWzZNGlC9vpKtAw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6E4AD13CAE; Mon, 2 Aug 2021 21:25:01 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id /OMCC61iCGEjBAAAMHmgww (envelope-from ); Mon, 02 Aug 2021 21:25:01 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit MIME-Version: 1.0 From: "NeilBrown" To: "Amir Goldstein" Cc: "Al Viro" , "Miklos Szeredi" , "Christoph Hellwig" , "Josef Bacik" , "J. Bruce Fields" , "Chuck Lever" , "Chris Mason" , "David Sterba" , "linux-fsdevel" , "Linux NFS list" , "Btrfs BTRFS" Subject: Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. In-reply-to: References: <162742539595.32498.13687924366155737575.stgit@noble.brown>, <162742546548.32498.10889023150565429936.stgit@noble.brown>, , <162762290067.21659.4783063641244045179@noble.neil.brown.name>, , <162762562934.21659.18227858730706293633@noble.neil.brown.name>, , <162763043341.21659.15645923585962859662@noble.neil.brown.name>, , <162787790940.32159.14588617595952736785@noble.neil.brown.name>, , <162788285645.32159.12666247391785546590@noble.neil.brown.name>, Date: Tue, 03 Aug 2021 07:24:58 +1000 Message-id: <162793949857.32159.8101709423759352396@noble.neil.brown.name> Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Mon, 02 Aug 2021, Amir Goldstein wrote: > On Mon, Aug 2, 2021 at 8:41 AM NeilBrown wrote: > > > > On Mon, 02 Aug 2021, Al Viro wrote: > > > On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote: > > > > > > > It think we need to bite-the-bullet and decide that 64bits is not > > > > enough, and in fact no number of bits will ever be enough. overlayfs > > > > makes this clear. > > > > > > Sure - let's go for broke and use XML. Oh, wait - it's 8 months too > > > early... > > > > > > > So I think we need to strongly encourage user-space to start using > > > > name_to_handle_at() whenever there is a need to test if two things are > > > > the same. > > > > > > ... and forgetting the inconvenient facts, such as that two different > > > fhandles may correspond to the same object. > > > > Can they? They certainly can if the "connectable" flag is passed. > > name_to_handle_at() cannot set that flag. > > nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite > > perfect. However it is the best that can be done over NFS. > > > > Or is there some other situation where two different filehandles can be > > reported for the same inode? > > > > Do you have a better suggestion? > > > > Neil, > > I think the plan of "changing the world" is not very realistic. I disagree. It has happened before, it will happen again. The only difference about my proposal is that I'm suggesting the change be proactive rather than reactive. > Sure, *some* tools can be changed, but all of them? We only need to change the tools that notice there is a problem. So it is important to minimize the effect on existing tools, even when we cannot reduce it to zero. We then fix things that are likely to see a problem, or that actually do. And we clearly document the behaviour and how to deal with it, for code that we cannot directly affect. Remember: there is ALREADY breakage that has been fixed. btrfs does *not* behave like a "normal" filesystem. Nor does NFS. Multiple tools have been adjusted to work with them. Let's not pretend that will never happen again, but instead use the dynamic to drive evolution in the way we choose. > > I went back to read your initial cover letter to understand the > problem and what I mostly found there was that the view of > /proc/x/mountinfo was hiding information that is important for > some tools to understand what is going on with btrfs subvols. That was where I started, but not where I ended. There are *lots* of places that currently report inconsistent information for btrfs subvols. > > Well I am not a UNIX history expert, but I suppose that > /proc/PID/mountinfo was created because /proc/mounts and > /proc/PID/mounts no longer provided tool with all the information > about Linux mounts. > > Maybe it's time for a new interface to query the more advanced > sb/mount topology? fsinfo() maybe? With mount2 compatible API for > traversing mounts that is not limited to reporting all entries inside > a single page. I suppose we could go for some hierarchical view > under /proc/PID/mounttree. I don't know - new API is hard. Yes, exactly - but not just for mounts. Yes, we need new APIs (Because the old ones have been broken in various ways). That is exactly what I'm proposing. But "fixing" mountinfo turns out to be little more than rearranging deck-chairs on the Titanic. > > In any case, instead of changing st_dev and st_ino or changing the > world to work with file handles, why not add inode generation (and > maybe subvol id) to statx(). The enormous benefit of filehandles is that they are supported by kernels running today. As others have commented, they also work over NFS. But I would be quite happy to see more information made available through statx - providing the meaning of that information was clearly specified - both what can be assumed about it and what cannot. Thanks, NeilBrown > filesystem that care enough will provide this information and tools that > care enough will use it. > > Thanks, > Amir. > >