Received: by 2002:a05:6a10:c604:0:0:0:0 with SMTP id y4csp3886373pxt; Tue, 10 Aug 2021 13:51:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyrGEmLxSln//6VM72XQjP9RolHAhfYsmNt2UvxSjbcO754D0qN3afzx0H7c5PyLtJJoHOR X-Received: by 2002:a6b:3b17:: with SMTP id i23mr5375ioa.34.1628628706745; Tue, 10 Aug 2021 13:51:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1628628706; cv=none; d=google.com; s=arc-20160816; b=RopKZLcWjReCCwwC30xbo9t25jq7+XkH7AA00B6dGD1D8i5hIQkIKdxwEiDqQJh0WA PuCk3QqAswSUCFSFuhBXtj0Ns8Gir+5T6Os4ZpIgmtv5AvgDMK6tJn0wmAEQwx6eg1h2 zf7tA9Up1ySZ807FP4iry92fZjrvXCfuAF/BmtDoN+R6o92Bk/tUq/1z0mvIWJwNVSba baDlp+n+R6lvK7VoHR+eaKhho5dvYPj3wLyKecHbVSD/JQ6Hr0syK+VVDn5J4Rxokqdt JjTRGt/YB5QDlLdET+0YsavuwJKJ/cBUNVD+0Mv/hR3BFhuP8+tQLuE3D0x0xWi8Wqq3 DPSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=ho1c4ybrIcKzeGnDZgrxScKRAzWLoVC8JJLL8evuc10=; b=yppVkiGE+mZUqDPKSF/Nt/iLBWkayE03wJZcH6rJQ773xKCHDRukty3z1Oeq4k1xO2 mBbwbCwV1iK3lzpiqtRZQTIW71CEUyMa9/l6NNEX0BtHv4UtpMnibdIVLREAYeKhBjyP 3uJ12YkCKm9lkOEl4fihcJIc+xukEwEJScKYrWHFaEEtdIxocObDOzCjmdxWyBmenwPM aLqgdbe+EQ6rJS3TEK5FenwaMChMBFy1qmKZe5INMRx+HIF5NzPAdJR+K4dAbOZjsZaV eShR7DrJDNiFFbv6rdUh9XLdE0cUvVwZJgqYbyslmznN6/BrMYQAHODHuIkx2gP1RUao lONQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=f9qqvba+; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m6si22872387iol.83.2021.08.10.13.51.21; Tue, 10 Aug 2021 13:51:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=f9qqvba+; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233424AbhHJUvh (ORCPT + 99 others); Tue, 10 Aug 2021 16:51:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40192 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233412AbhHJUvg (ORCPT ); Tue, 10 Aug 2021 16:51:36 -0400 Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3FB6BC0613C1 for ; Tue, 10 Aug 2021 13:51:14 -0700 (PDT) Received: by mail-qk1-x72f.google.com with SMTP id n123so3145617qkn.12 for ; Tue, 10 Aug 2021 13:51:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=ho1c4ybrIcKzeGnDZgrxScKRAzWLoVC8JJLL8evuc10=; b=f9qqvba+vb4gKwoCe07TXYMmUbaDXkZASg1FIRrHiPUcT5xNflmTPouhEjP+OKukFA JrBHXe6q0qqN+p8I85JakAdvaW7qkmCwVulGcmwvtS8XfvfUstRLmzily7+p2qiRG/Rd ce8sAGw3nurfeSFUfO2zWYO6Vk7Y97fPeE0gqluQpETHMGhK4jaqHA3o5QzzNdc3Os1o yqD2euebrIm5Dk2N3HvSbv4TfMqsN/UAXw++5xQJfVaDk0DLxs/e38xHd8ZgaCS5CJue KmtS67AmwsMADk5+iBxDaxLwsnAZmF1xU0eOzZ3o3pwNRIfnp2NmSALJYoawiypqyaiR UcPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=ho1c4ybrIcKzeGnDZgrxScKRAzWLoVC8JJLL8evuc10=; b=aGNd5j0ZpS4fdWCH7i/PYgoI6uuGYn7UN5JehhcQYXFEwdtcRw0Hk8KMOkLkaPnUGP BYQN1B9NGqQ2Sr46GHQ2+0GLvfL4U5ZOS7PVThg+1fF8sg8UQPD3zyfMxtvQmNLJxvM/ sTFmrdZttfmgYVyX1Sq8zCewSLengt4rmN1rQTjge+x/wjnmHeN9wj9iM4LK7D+fgfYe ycqQ3Axcps0fi2TJxt8zEQvt6JhQLbYLWXBkIapDM3j3YU4lpIH7hydB+8JXlXNB+BfS oaxpx8ftX2nJz6qkDMCzBDaWKrKKDtM5nxpyNGvcaZmAZCv9OZdpby5HQShZGb4K2yrc WuDg== X-Gm-Message-State: AOAM531PfY20/Pv4uFjm022wz9qPAmU22TTIptFLKhDNm+sWubCVZlfE oPNNUwyrgDdZjXQMzBOXMdlfDQ== X-Received: by 2002:ae9:e213:: with SMTP id c19mr30144120qkc.451.1628628673318; Tue, 10 Aug 2021 13:51:13 -0700 (PDT) Received: from ?IPv6:2620:10d:c0a8:11d9::10ba? ([2620:10d:c091:480::1:6dda]) by smtp.gmail.com with ESMTPSA id v9sm7871780qtq.77.2021.08.10.13.51.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 10 Aug 2021 13:51:12 -0700 (PDT) Subject: Re: [PATCH/RFC 0/4] Attempt to make progress with btrfs dev number strangeness. To: NeilBrown , Chris Mason , David Sterba Cc: linux-fsdevel@vger.kernel.org, Linux NFS list , Btrfs BTRFS References: <162848123483.25823.15844774651164477866.stgit@noble.brown> From: Josef Bacik Message-ID: Date: Tue, 10 Aug 2021 16:51:11 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <162848123483.25823.15844774651164477866.stgit@noble.brown> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 8/8/21 11:55 PM, NeilBrown wrote: > I continue to search for a way forward for btrfs so that its behaviour > with respect to device numbers and subvols is somewhat coherent. > > This series implements some of the ideas in my "A Third perspective"[1], > though with changes is various details. > > I introduce two new mount options, which default to > no-change-in-behaviour. > > -o inumbits= causes inode numbers to be more unique across a whole btrfs > filesystem, and is many cases completely unique. Mounting > with "-i inumbits=56" will resolve the NFS issues that > started me tilting at this particular windmill. > > -o numdevs= can reduce the number of distinct devices reported by > stat(), either to 2 or to 1. > Both ease problems for sites that exhaust their supply of > device numbers. > '2' allows "du -x" to continue to work, but is otherwise > rather strange. > '1' breaks the use of "du -x" and similar to examine a > single subvol which might have subvol descendants, but > provides generally sane behaviour > "-o numdevs=1" also forces inumbits to have a useful value. > > I introduce a "tree id" which can be discovered using statx(). Two > files with the same dev and ino might still be different if the tree-ids > are different. Connected files with the same tree-id may be usefully > considered to be related. > > I also change various /proc files (only when numdevs=1 is used) to > provide extra information so they are useful with btrfs despite subvols. > /proc/maps /proc/smaps /proc/locks /proc/X/fdinfo/Y are affected. > The inode number becomes "XX:YY" where XX is the subvol number (tree id) > and YY is the inode number. > > An alternate might be to report a number which might use up to 128 bits. > Which is less likely to seriously break code? > > Note that code which ignores badly formatted lines is safe, because it > will never currently find a match for a btrfs file in these files > anyway. The device number they report is never returned in st_dev for > stat() on any file. > > The audit subsystem and one or two other places report dev/ino and so > need enhanced, but I haven't tried to address those. > > Various trace points also report dev/ino. I haven't tried thinking > about those either. I think this is a step in the right direction, but I want to figure out a way to accomplish this without magical mount points that users must be aware of. I think the stat() st_dev ship as sailed, we're stuck with that. However Christoph does have a valid point where it breaks the various info spit out by /proc. You've done a good job with the treeid here, but it still makes it impossible for somebody to map the st_dev back to the correct mount. I think we aren't going to solve that problem, at least not with stat(). I think with statx() spitting out treeid we have given userspace a way to differentiate subvolumes, and so we should fix statx() to spit out the the super block device, that way new userspace things can do their appropriate lookup if they so choose. This leaves the problem of nfsd. Can you just integrate this new treeid into nfsd, and use that to either change the ino within nfsd itself, or do something similar to what your first patchset did and generate a fsid based on the treeid? Mount options are messy, and are just going to lead to distro's turning them on without understanding what's going on and then we have to support them forever. I want to get this fixed in a way that we all hate the least with as little opportunity for confused users to make bad decisions. Thanks, Josef