Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp6669454pxv; Thu, 29 Jul 2021 22:28:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyF8Lf5tdYH0s3PrW2ufdHa+BL/jA7bqfuuQ2maxZZS7RaMeaIwi9oIjAak7f6UB7ejQarn X-Received: by 2002:a05:6e02:672:: with SMTP id l18mr670124ilt.228.1627622912106; Thu, 29 Jul 2021 22:28:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627622912; cv=none; d=google.com; s=arc-20160816; b=I2QlFLjhiYsRUzw8ooH9NqY0EYLwUBtjAN+XQpOZIqu/sC1+IlMAeybU03gAef8LEd isl0J0nDcsnIqbrLx44rHzoZfVgu0OFwQe/LxWbqzqTKlBD8Jw1dlbMDJTWdCDr0KUwm t+LkTbMGHUNlR5DF2DU26ObVJjOC+1FUpgl4+VIGHDFujwQVBBPxGcqQlKz7sSet0/Dg +4/qZvkJ/JmXQlzqljDI/nJr78TWR5piXk/zlXCfoGpjaVStcp5BXTDT3Mz1v+xVwyZc lplothCxSDeyD3bPLUYO2wroSdePrIer5REnKL1Ikw4xsEol49JFoG4Pviusfwbaz1bU BkIg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=BL70il6j4B/RKCGLXufodhVZi7V6X0RKO/zcq4BIb7o=; b=00p+lleDSZXX87Eeb0q8os82IzXUghiEPEY08ICqbFlE73W6MDIvjCJnvdgcL57e// 96bjTXFMnBdFpEW/I3GbFsrMLtJfpg2jtFVeiAScSDdBvW2yJJ2NviWT7V3kv7K1efxb JUfZhSq/dPh2mQwncc/dWNgFI1ezJyIL7BwRYPBP1KUEF1mEFy1FnKYW8vX2Dg26Xefr 5jYMxzu2NpoEh/yCJwUKXuVRW8qNfUvYJFGJT6ut/65T9MJxgbWK5ATmIdzqNJJsXYuf Dawf6YII6GGF+LDH6PvXFxPuoJ8mhd5s+u6zXhPf0oAKGm13COQNtTR4vfYVvENzUUoN zaTQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=UmELttU7; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id t17si561408jal.112.2021.07.29.22.28.19; Thu, 29 Jul 2021 22:28:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=UmELttU7; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233201AbhG3F2W (ORCPT + 99 others); Fri, 30 Jul 2021 01:28:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47932 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231705AbhG3F2W (ORCPT ); Fri, 30 Jul 2021 01:28:22 -0400 Received: from mail-il1-x133.google.com (mail-il1-x133.google.com [IPv6:2607:f8b0:4864:20::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50873C061765; Thu, 29 Jul 2021 22:28:17 -0700 (PDT) Received: by mail-il1-x133.google.com with SMTP id x7so4877297ilh.10; Thu, 29 Jul 2021 22:28:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=BL70il6j4B/RKCGLXufodhVZi7V6X0RKO/zcq4BIb7o=; b=UmELttU7zjWxbEyOwswjsReYDqzKBVfmhteZYII00y+ViYN/5jETN0rzGmvYiVGdLS 56EjiqzZZqXm77M0cXs+/MspcK0VJQHXNTFjqarF9k/Gixe+JNUORX/dnTVgjq32uw4U 4wuMN+utJfrJyGqKUP4KLUKgFIBpn9bsNE9NO2Rz4E9qFhDR9O09+pYksi6fwgL7isY6 6txmVqJJ3EfGkR/G9nqiTuH62YDCm1uyILDFIAtAb5sR5jIxOdJg2xd/RaXW43zFCag+ LDRudEGrfr1WxkzxIDScqaCfQLfAwVzGZ6J1B81TJ+mglTz48VwbjTqcJXpHYImS+oO1 z/zQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=BL70il6j4B/RKCGLXufodhVZi7V6X0RKO/zcq4BIb7o=; b=uPF8/HdaNUnlXYuyDtYcUCOkWfOT7IXczRVPhYm4BT7QH+xNv7OXbXZ7nNCJD8F6Pm H2cdH1s2d6B85f6kkr4KS1qjiIyhtcfWMzVHCk6Upl8jjxNogMdKrDa1ynY35BW2AZzR k93J1ow7ZzM05iu9xtaIeMQDkiCOqTb8yjLRWR82w1VpB8H80jkR/1L3FS1A3CtxsDfH HJbKw8O/9VjnT5t9uokkD0CwS9jdfCIj0LWq6NylwSqlQfVSm0KfWpcHAnIBtpaa9l/w klFrCeSgw4Fb+73TaWJZ+Yy+Vowg7amgfmnoTjkzZ9RrZS7MeNbjemE0YRs3A0d2cKfc EEOw== X-Gm-Message-State: AOAM530q4PjomztDUx3hThkC0KQLu/wrpdLk1N+Pnv3D692aDH1hD/Ti 16/0Bburlraz/zANNa0E92J7CsCS1aghIhdj5N8= X-Received: by 2002:a05:6e02:1c2d:: with SMTP id m13mr265569ilh.137.1627622896751; Thu, 29 Jul 2021 22:28:16 -0700 (PDT) MIME-Version: 1.0 References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <20210728125819.6E52.409509F4@e16-tech.com> <20210728140431.D704.409509F4@e16-tech.com> <162745567084.21659.16797059962461187633@noble.neil.brown.name> <162751265073.21659.11050133384025400064@noble.neil.brown.name> <20210729023751.GL10170@hungrycats.org> <162752976632.21659.9573422052804077340@noble.neil.brown.name> <20210729232017.GE10106@hungrycats.org> <162761259105.21659.4838403432058511846@noble.neil.brown.name> In-Reply-To: <162761259105.21659.4838403432058511846@noble.neil.brown.name> From: Amir Goldstein Date: Fri, 30 Jul 2021 08:28:05 +0300 Message-ID: Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly To: NeilBrown Cc: Zygo Blaxell , Neal Gompa , Wang Yugui , Christoph Hellwig , Josef Bacik , "J. Bruce Fields" , Chuck Lever , Chris Mason , David Sterba , Alexander Viro , linux-fsdevel , Linux NFS Mailing List , Btrfs BTRFS Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Fri, Jul 30, 2021 at 5:41 AM NeilBrown wrote: > > > I've been pondering all the excellent feedback, and what I have learnt > from examining the code in btrfs, and I have developed a different > perspective. > > Maybe "subvol" is a poor choice of name because it conjures up > connections with the Volumes in LVM, and btrfs subvols are very different > things. Btrfs subvols are really just subtrees that can be treated as a > unit for operations like "clone" or "destroy". > > As such, they don't really deserve separate st_dev numbers. > > Maybe the different st_dev numbers were introduced as a "cheap" way to > extend to size of the inode-number space. Like many "cheap" things, it > has hidden costs. > > Maybe objects in different subvols should still be given different inode > numbers. This would be problematic on 32bit systems, but much less so on > 64bit systems. > > The patch below, which is just a proof-of-concept, changes btrfs to > report a uniform st_dev, and different (64bit) st_ino in different subvols. > > It has problems: > - it will break any 32bit readdir and 32bit stat. I don't know how big > a problem that is these days (ino_t in the kernel is "unsigned long", > not "unsigned long long). That surprised me). > - It might break some user-space expectations. One thing I have learnt > is not to make any assumption about what other people might expect. > > However, it would be quite easy to make this opt-in (or opt-out) with a > mount option, so that people who need the current inode numbers and will > accept the current breakage can keep working. > > I think this approach would be a net-win for NFS export, whether BTRFS > supports it directly or not. I might post a patch which modifies NFS to > intuit improved inode numbers for btrfsdemostrates exports.... > > So: how would this break your use-case?? The simple cases are find -xdev and du -x which expect the st_dev change, but that can be excused if opting in to a unified st_dev namespace. The harder problem is collisions which are not even that hard to hit with unlimited number of snapshots. The 'diff' tool demonstrates the implications of collisions for different objects on userspace. See xfstest overlay/049 for a demonstration. The overlayfs xino feature made a similar change to overlayfs with one big difference - applications expect that all objects in overlayfs mount will have the same st_dev. Also, overlayfs has prior knowledge on the number of layers so it is easier to parcel the ino namespace and avoid collisions. Thanks, Amir.