Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp594570imm; Wed, 6 Jun 2018 02:50:37 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIptcxRPyj+rE9VOcFPeVfl/06kSolXp7/hmQThEZhxlzJ+4wMEbHIypG9QH/I6fAtFj5FP X-Received: by 2002:a63:af50:: with SMTP id s16-v6mr2025557pgo.263.1528278637672; Wed, 06 Jun 2018 02:50:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528278637; cv=none; d=google.com; s=arc-20160816; b=F7RYINhZUCQKlniV40iNiqiHFh5vL8uypCLLE/7+iMgvH8FN6+1aOcnbTIbgsJ3Z34 MN9LMv1VDr9AfJYzENr9IfaxtTLKYZG6twq0CElHUxegityFTAq3TBEggRXSxt7LFqx7 CchkprQ6vzzJjOjXAiktQecx4e+UFI1YASSulucxn6lkYTqi0YWy+u86NwJXfgR+9FXq NWGNwRMnSOvPxNTRQYGO3JXrPnnYa8FiPZx7VLxk6hZmzgQmuVl/7p1OyOea4uxt2XKy MGTY+7eNciYjCYmQHfAv/3Es0uxmds5WL+xxHqpEU56x/bxclKtV8hH5t/F+i9FF4V8F mLWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=EViBw8lC2spaTPUBUJbqs6jpS2Nec+kbFdVSJD1BJFY=; b=QicVlMI+gQWb77ssvMKgbXrWI4TZJaCuZA9KYUPqlFyUiwr1vL4aFX6W+yfFxXA+rP Y610jCEnQfWgwMhLJ1lLVAn0oE1mezVGNol7Wre+ABV93Bd+NWrj0aeM5noFGTwuLd6I MT236lAdRSy1zJI/e8fPdkvQ+Ke1Zbh1xQHRE1IO7VJJfv+UknR+bnwDG57UuwxV32fE zGLpTOrwwgPuiQSNVeO6LURmlNimV66Ca8+TAHH/D9Alkp7h+IzQWtwHzqLef3NX72TC jjnV3pzrImXhOC2E3/lCP5O8fUAm2QUZKW28lwBlcfDU8ddgRVt6X3Ias+YyqWUqvgWO 4BxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=h1XlI2rs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i197-v6si5947518pgc.161.2018.06.06.02.50.22; Wed, 06 Jun 2018 02:50:37 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=h1XlI2rs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932337AbeFFJtw (ORCPT + 99 others); Wed, 6 Jun 2018 05:49:52 -0400 Received: from mail-yb0-f195.google.com ([209.85.213.195]:46963 "EHLO mail-yb0-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750955AbeFFJtu (ORCPT ); Wed, 6 Jun 2018 05:49:50 -0400 Received: by mail-yb0-f195.google.com with SMTP id p22-v6so1759805yba.13; Wed, 06 Jun 2018 02:49:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=EViBw8lC2spaTPUBUJbqs6jpS2Nec+kbFdVSJD1BJFY=; b=h1XlI2rs4Zbr21Odczh8QtO5DbXHVkKdmGC71uJWzOGN89POU8CYI46GoQ+hJ27pL3 jgIo9qm7Coe4uy4waVHXPn4v9lkodCQdq/tnSB8xUfwGXyetQOTux+Bt7SbdlHFOPc1E 91sYmDTvGCLlusxNrK2qnSKIxZBDjItrvr6IsA5mek/gJ48jY1AaOPhJAq+r92GFjOs0 QTDz+wwDkv8+qqBXsQa9DQyZSwDyq24rulA8YEO6c4H3nxLHxJeh1fmhtxUUfDIYQ0AM gUsM3q6U0bhdeozSbnkAwbf7vAQ1xZZFdyhHvE+9ybYEUJVeWziwi/RTPix7Qi4n/HSU eJlA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=EViBw8lC2spaTPUBUJbqs6jpS2Nec+kbFdVSJD1BJFY=; b=mG8hd8QoNPUhLZ4dG6no11Q5KMxF1MARWMz0uZ9NvaUMLYSSi47g//rRO69YD7/aC2 MqvnjGCUIp7FHXGymYAAh0n1l5sshsN4sS95Crs1VRRQ0EAlUxDAgqcA22431TptiLMV 0fVk3puBMpbiDGrTjfFfhrUnbV0meMUrqzhq/a8j4ZeWAUDg+SNSMxfeOuioqYMtomgS 87DBGrUqM0Ud5TrcCihcvM2GvohJqPZDrOYmqyly2FND1ZAQKA7QDq2dOhgrucNtdh22 UOfakCe//6p0CkHKoB7GkTY7lTHxkWbH3wyXkIOS36h2KRgE3pTdi20WOblYZS/oTaCu pNJw== X-Gm-Message-State: APt69E0g74BfKkgnfVao6Tp0cVP1lgxbKshTTy0P5MMymK5i+YUMDh9/ eOlCdN7gxhnL5f8nhYEI4+6sY4q5Ucyhfmg4X5k= X-Received: by 2002:a25:a05:: with SMTP id 5-v6mr1069312ybk.444.1528278589427; Wed, 06 Jun 2018 02:49:49 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a81:79cf:0:0:0:0:0 with HTTP; Wed, 6 Jun 2018 02:49:48 -0700 (PDT) In-Reply-To: <5b2ae799-1595-c262-7b65-41b10c11906d@suse.com> References: <20180508180436.716-1-mfasheh@suse.de> <20180508233840.GM10363@dastard> <20180509064103.GP10363@dastard> <5b2ae799-1595-c262-7b65-41b10c11906d@suse.com> From: Amir Goldstein Date: Wed, 6 Jun 2018 12:49:48 +0300 Message-ID: Subject: Re: [RFC][PATCH 0/76] vfs: 'views' for filesystems with more than one root To: Jeff Mahoney Cc: Dave Chinner , Mark Fasheh , linux-fsdevel , linux-kernel , Linux Btrfs , Miklos Szeredi , David Howells , Jan Kara Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 5, 2018 at 11:17 PM, Jeff Mahoney wrote: > Sorry, just getting back to this. > > On 5/9/18 2:41 AM, Dave Chinner wrote: >> On Tue, May 08, 2018 at 10:06:44PM -0400, Jeff Mahoney wrote: >>> On 5/8/18 7:38 PM, Dave Chinner wrote: >>>> On Tue, May 08, 2018 at 11:03:20AM -0700, Mark Fasheh wrote: >>>>> Hi, >>>>> >>>>> The VFS's super_block covers a variety of filesystem functionality. In >>>>> particular we have a single structure representing both I/O and >>>>> namespace domains. >>>>> >>>>> There are requirements to de-couple this functionality. For example, >>>>> filesystems with more than one root (such as btrfs subvolumes) can >>>>> have multiple inode namespaces. This starts to confuse userspace when >>>>> it notices multiple inodes with the same inode/device tuple on a >>>>> filesystem. >>>> Speaking as someone who joined this discussion late, maybe years after it started, it would help to get an overview of existing problems and how fs_view aims to solve them. I do believe that both Overlayfs and Btrfs can benefit from a layer of abstraction in the VFS, but I think it is best if we start with laying all the common problems and then see how a solution would look like. Even the name of the abstraction (fs_view) doesn't make it clear to me what it is we are abstracting (security context? st_dev? what else?). probably best to try to describe the abstraction from user POV rather then give sporadic examples of what MAY go into fs_view. While at it, need to see if this discussion has any intersections with David Howell's fs_context work, because if we consider adding sub volume support to VFS, we may want to leave some reserved bits in the API for it. [...] > One thing is clear: If we want to solve the btrfs and overlayfs problems > in the same way, the view approach with a simple static mapping doesn't > work. Sticking something between the inode and superblock doesn't get > the job done when the belongs to a different file system. Overlayfs > needs a per-object remapper, which means a callback that takes a path. > Suddenly the way we do things in the SUSE kernel doesn't seem so hacky > anymore. > And what is the SUSE way? > I'm not sure we need the same solution for btrfs and overlayfs. It's > not the same problem. Every object in overlayfs as a unique mapping > already. If we report s_dev and i_ino from the inode, it still maps to > a unique user-visible object. It may not map back to the overlayfs > name, but that's a separate issue that's more difficult to solve. The > btrfs issue isn't one of presenting an alternative namespace to the > user. Btrfs has multiple internal namespaces and no way to publish them > to the rest of the kernel. > FYI, the Overlayfs file/inode mapping is about to change with many VFS hacks queued for removal, so stay tuned. [...] >> My point is that if we are talking about infrastructure to remap >> what userspace sees from different mountpoint views into a >> filesystem, then it should be done above the filesystem layers in >> the VFS so all filesystems behave the same way. And in this case, >> the vfsmount maps exactly to the "fs_view" that Mark has proposed we >> add to the superblock..... > > It's proposed to be above the superblock with a default view in the > superblock. It would sit between the inode and the superblock so we > have access to it anywhere we already have an inode. That's the main > difference. We already have the inode everywhere it's needed. Plumbing > a vfsmount everywhere needed means changing code that only requires an > inode and doesn't need a vfsmount. > > The two biggest problem areas: > - Writeback tracepoints print a dev/inode pair. Do we want to plumb a > vfsmount into __mark_inode_dirty, super_operations->write_inode, > __writeback_single_inode, writeback_sb_inodes, etc? > - Audit. As it happens, most of audit has a path or file that can be > used. We do run into problems with fsnotify. fsnotify_move is called > from vfs_rename which turns into a can of worms pretty quickly. > Can you please elaborate on that problem. Do you mean when watching a directory for changes, you need to be able to tell in which fs_view the directory inode that is being watched? >>> It makes sense for that to be above the >>> superblock because the file system doesn't care about them. We're >>> interested in the inode namespace, which for every other file system can >>> be described using an inode and a superblock pair, but btrfs has another >>> layer in the middle: inode -> btrfs_root -> superblock. >> >> Which seems to me to be irrelevant if there's a vfsmount per >> subvolume that can hold per-subvolume information. > > I disagree. There are a ton of places where we only have access to an > inode and only need access to an inode. It also doesn't solve the > overlayfs issue. > I have an interest of solving another problem. In VFS operations where only inode is available, I would like to be able to report fsnotify events (e.g. fsnotify_move()) only in directories under a certain subtree root. That could be achieved either by bind mount the subtree root and passing vfsmount into vfs_rename() or by defining an fs_view on the subtree and mounting that fs_view. Thanks, Amir.