Received: by 2002:a05:6a10:c7c6:0:0:0:0 with SMTP id h6csp1658280pxy; Mon, 2 Aug 2021 07:14:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy0t7qnlSzIyiGhoge/hHCcjwr3+Mbjt9JzOmH8bccyweHhadNjXMQfxR7jbeYpMaCnM/1u X-Received: by 2002:a05:6e02:1905:: with SMTP id w5mr1917785ilu.270.1627913671466; Mon, 02 Aug 2021 07:14:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627913671; cv=none; d=google.com; s=arc-20160816; b=yaQpMt7Ir3YIda1CtPnxdPouH5EXzuBP+NY1S9G9Y7HQshHUIk2R5ym03ENl2DN4+g Q43SdEellrk6+sUYUfqBKMAVHTsdrPIREYqeynNBbyUBPAW3eSsyLbkD8artZwAdUg4w xA/DUMvKCgvqdq6qvkUaTn7ZVwtmLz+E2Wl31Pamc22JiU+E5VFbMKkaCJv0jT0Vw9IW LoUEtQcKNG55OKmGuCDLsY32earcZ/+C35kJQuc7L7jczcUV03Cutvw8+rxtQvSb7UX5 TNTuqCf21WS7FDlwylSRLcY/e9yg+tO3YgQNJ16Tka+PF/LoEM+bj5t3TKX8+10JV+yv B/VA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=7YZ+YWuu0r8ls2F/ASXAfqcelR3faj3UC8eZqkF+Zko=; b=cNLnYSAAi/celprSIJCp8c8vy6FzcqKGHgkzo6rz6RHm8laHwJr6gqC6lbUrp4n2iR fWf5O2OycAgeCKj6Q3aSTRb43YVFMcBheZS0pQ30z5Mv0D/obccCsbDCm+8LeQv1kc+V iIObfE2jklUkqQfhIkJRucOHaNzttbuDFLanaZWrJX4yLZ1rLas0T+2XD6aTaJdsVJWD ZA5XdCFujQ97IZGwbB+1jY06baK65qTlEdxwvmGRIej6Qj/GT/Dskib3BZHJVGb+0RUV XwJ0zv0Ou7YkZFQ0UwZ1DZ1D+QuzNWLUZRNahyR7x5s/2u+GLrL53ZMStxWP0dewWGOy gwVQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=pi2O34Gb; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id y14si13652205iot.9.2021.08.02.07.14.07; Mon, 02 Aug 2021 07:14:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=pi2O34Gb; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236769AbhHBONq (ORCPT + 99 others); Mon, 2 Aug 2021 10:13:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43108 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239151AbhHBOL4 (ORCPT ); Mon, 2 Aug 2021 10:11:56 -0400 Received: from mail-qk1-x72e.google.com (mail-qk1-x72e.google.com [IPv6:2607:f8b0:4864:20::72e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CACBCC028BFD for ; Mon, 2 Aug 2021 06:53:43 -0700 (PDT) Received: by mail-qk1-x72e.google.com with SMTP id z24so16575591qkz.7 for ; Mon, 02 Aug 2021 06:53:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=7YZ+YWuu0r8ls2F/ASXAfqcelR3faj3UC8eZqkF+Zko=; b=pi2O34Gbu2DJiPT2DXuDG4QdH2iMvOdf7tSszje2qigN/T8dFHQH1pZvveTmPZn6j+ n+3K6aI91XYsSTxYK5jB8LHLswOYhOtyzb1/xWaQKFKzEy3YX52WVV2Dqhsu07r6IDBy /Msf3u0R5zINNltrJUS4FpJrqYDP/afbpxcJpNpJXUIUeChB3cG/u39CTUTc6tYtV8MR m9IixiU2rlsVICCdlryAlPFb6PkwD//xhAm4KiY3GiDBnyp1U/RkfoGK9NZkWcfIxAVy CpJ7xSMq8XqGrE/t7mnlmMpciZNikmfd2pCBtxlfO7JcrlZYbk6z+N+tEp+CIhBteKHm 2tGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=7YZ+YWuu0r8ls2F/ASXAfqcelR3faj3UC8eZqkF+Zko=; b=l5W6LjXHP+PycNJp9Ci7ZwfCjbDt0LqM0AoC9ZjzkcOdteSLG3FGwfoHCI0jyrBmYp bn2WfFLs0M/AW1wofb0UBvKnsFsaZhGmUfRtKHHcgved1I7W7he/BoPUn618087Bc7Wi 94lfHZ7z3zOIfz3K7X1mJguocj0zrEFSr2B1iEMA0+Vdhg0UBMD7uR8dmSFHroxH7rIl 5GXBGcLZkKqGrLDWyD7kpuvSKjegCnxamRRUYMQ0eolA2pMzQRyOpiI0ufyUK7g/4IYb 71RHYsfj489W6hFRZTe803ppK52P6G07bp8/rrzrk6Fnh8PxcQJs0SqP+UjAp1/zyvNK 8/+w== X-Gm-Message-State: AOAM531kt5tnO60zagcBlR8q7AKQzO5J5ElWk2btfHBue3P0QNyWqVEm nSoBPUmcWcg8RidStCl2YsUpUA== X-Received: by 2002:a05:620a:233:: with SMTP id u19mr15753757qkm.48.1627912422875; Mon, 02 Aug 2021 06:53:42 -0700 (PDT) Received: from [192.168.1.110] (38-132-189-23.dynamic-broadband.skybest.com. [38.132.189.23]) by smtp.gmail.com with ESMTPSA id a127sm6015928qkc.121.2021.08.02.06.53.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 02 Aug 2021 06:53:42 -0700 (PDT) Subject: Re: A Third perspective on BTRFS nfsd subvol dev/inode number issues. To: Amir Goldstein , NeilBrown Cc: Al Viro , Miklos Szeredi , Christoph Hellwig , "J. Bruce Fields" , Chuck Lever , Chris Mason , David Sterba , linux-fsdevel , Linux NFS list , Btrfs BTRFS References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <162742546548.32498.10889023150565429936.stgit@noble.brown> <162762290067.21659.4783063641244045179@noble.neil.brown.name> <162762562934.21659.18227858730706293633@noble.neil.brown.name> <162763043341.21659.15645923585962859662@noble.neil.brown.name> <162787790940.32159.14588617595952736785@noble.neil.brown.name> <162788285645.32159.12666247391785546590@noble.neil.brown.name> From: Josef Bacik Message-ID: <2337f1ba-ffed-2369-47a0-5ffda2d8b51c@toxicpanda.com> Date: Mon, 2 Aug 2021 09:53:41 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 8/2/21 3:54 AM, Amir Goldstein wrote: > On Mon, Aug 2, 2021 at 8:41 AM NeilBrown wrote: >> >> On Mon, 02 Aug 2021, Al Viro wrote: >>> On Mon, Aug 02, 2021 at 02:18:29PM +1000, NeilBrown wrote: >>> >>>> It think we need to bite-the-bullet and decide that 64bits is not >>>> enough, and in fact no number of bits will ever be enough. overlayfs >>>> makes this clear. >>> >>> Sure - let's go for broke and use XML. Oh, wait - it's 8 months too >>> early... >>> >>>> So I think we need to strongly encourage user-space to start using >>>> name_to_handle_at() whenever there is a need to test if two things are >>>> the same. >>> >>> ... and forgetting the inconvenient facts, such as that two different >>> fhandles may correspond to the same object. >> >> Can they? They certainly can if the "connectable" flag is passed. >> name_to_handle_at() cannot set that flag. >> nfsd can, so using name_to_handle_at() on an NFS filesystem isn't quite >> perfect. However it is the best that can be done over NFS. >> >> Or is there some other situation where two different filehandles can be >> reported for the same inode? >> >> Do you have a better suggestion? >> > > Neil, > > I think the plan of "changing the world" is not very realistic. > Sure, *some* tools can be changed, but all of them? > > I went back to read your initial cover letter to understand the > problem and what I mostly found there was that the view of > /proc/x/mountinfo was hiding information that is important for > some tools to understand what is going on with btrfs subvols. > > Well I am not a UNIX history expert, but I suppose that > /proc/PID/mountinfo was created because /proc/mounts and > /proc/PID/mounts no longer provided tool with all the information > about Linux mounts. > > Maybe it's time for a new interface to query the more advanced > sb/mount topology? fsinfo() maybe? With mount2 compatible API for > traversing mounts that is not limited to reporting all entries inside > a single page. I suppose we could go for some hierarchical view > under /proc/PID/mounttree. I don't know - new API is hard. > > In any case, instead of changing st_dev and st_ino or changing the > world to work with file handles, why not add inode generation (and > maybe subvol id) to statx(). > filesystem that care enough will provide this information and tools that > care enough will use it. > Can y'all wait till I'm back from vacation, goddamn ;) This is what I'm aiming for, I spent some time looking at how many places we string parse /proc//mounts and my head hurts. Btrfs already has a reasonable solution for this, we have UUID's for everything. UUID's aren't a strictly btrfs thing either, all the file systems have some sort of UUID identifier, hell its built into blkid. I would love if we could do a better job about letting applications query information about where they are. And we could expose this with the relatively common UUID format. You ask what fs you're in, you get the FS UUID, and then if you're on Btrfs you get the specific subvolume UUID you're in. That way you could do more fancy things like know if you've wandered into a new file system completely or just a different subvolume. We have to keep the st_ino/st_dev thing for backwards compatibility, but make it easier to get more info out of the file system. We could in theory expose just the subvolid also, since that's a nice simple u64, but it limits our ability to do new fancy shit in the future. It's not a bad solution, but like I said I think we need to take a step back and figure out what problem we're specifically trying to solve, and work from there. Starting from automounts and working our way back is not going very well. Thanks, Josef