Received: by 10.192.165.148 with SMTP id m20csp4641412imm; Tue, 8 May 2018 11:43:43 -0700 (PDT) X-Google-Smtp-Source: AB8JxZoRy9Btu0CwjvUVQu74GXsv4mI61+fzyvhwuYS3TrQP+kCnXALy2TxiA+ywtQvsZYz5jIWQ X-Received: by 2002:a63:9612:: with SMTP id c18-v6mr33504994pge.361.1525805023596; Tue, 08 May 2018 11:43:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1525805023; cv=none; d=google.com; s=arc-20160816; b=a4zuUo/ru1lqfc1mphq1YB5cvSVNI4KK9aIQmeWJ1L4OH9h2LMNW3jnQcj3CsRyzY+ IVZPbiCCQeUcGfL9rMSK4gV89fehuUFblwyVusK/Y+GbuOFxkFILufWmBUyKd31yx+Kg coNW8ccO9joqLtOhpVVWl5NgQHqGEo2cMrZU5cfCkMpoCvt690gL8gEYUc15ebuAlWYK 3FohuBRV6+WVk8PiY+8jwgUCRW2tOucijlm+Dn8KUWHaidZTRQypL/7vqJAy7HvmORzd uVEo1ZGApp2d8YQj7sueB+KeO5FIvqsm2ARe2O/pec6eqDHjc4LudqJTeRArjdL0zINk 8+jw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:subject:cc:to:from :arc-authentication-results; bh=lrrf8z54VBQWgWHsL/1fbORfop1rq3qSJ80UadHEFeA=; b=lCJi463154vRODG7WDps8WBNHDrD96qc9yQI1J7icJto9sBc9vgJ8cMk9+KOqDGbMi VaaxI/Qc2DTiVqNM7YeVtV5EWEKBNOLYMq/kzsXo9ioST/VoFOzfvzxoVoAzPrcG0MrD c3y284B2s3P70BWKEfmjOIaARa7GQ2acBqEyFH+9fjIDvMr0mFjz0ejKQwh0mV5hlshz 5jLr8EFElbxMt4doGgaGB2UKD30FPs7PvpatrhtEk5r7xAvazGjhkrYfox/3KxzKDAbl 16hM6xS3jjb4L/1BTfTUdO697e9y8BCRip/X5vxU1gWrIF0Okp/GyJWvoPOGs64Q83YU vimA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c84si11943415pfd.89.2018.05.08.11.43.28; Tue, 08 May 2018 11:43:43 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755188AbeEHSEu (ORCPT + 99 others); Tue, 8 May 2018 14:04:50 -0400 Received: from mx2.suse.de ([195.135.220.15]:53805 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751723AbeEHSEt (ORCPT ); Tue, 8 May 2018 14:04:49 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 5C3EBAD4A; Tue, 8 May 2018 18:04:47 +0000 (UTC) From: Mark Fasheh To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: [RFC][PATCH 0/76] vfs: 'views' for filesystems with more than one root Date: Tue, 8 May 2018 11:03:20 -0700 Message-Id: <20180508180436.716-1-mfasheh@suse.de> X-Mailer: git-send-email 2.15.1 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, The VFS's super_block covers a variety of filesystem functionality. In particular we have a single structure representing both I/O and namespace domains. There are requirements to de-couple this functionality. For example, filesystems with more than one root (such as btrfs subvolumes) can have multiple inode namespaces. This starts to confuse userspace when it notices multiple inodes with the same inode/device tuple on a filesystem. In addition, it's currently impossible for a filesystem subvolume to have a different security context from it's parent. If we could allow for subvolumes to optionally specify their own security context, we could use them as containers directly instead of having to go through an overlay. I ran into this particular problem with respect to Btrfs some years ago and sent out a very naive set of patches which were (rightfully) not incorporated: https://marc.info/?l=linux-btrfs&m=130074451403261&w=2 https://marc.info/?l=linux-btrfs&m=130532890824992&w=2 During the discussion, one question did come up - why can't filesystems like Btrfs use a superblock per subvolume? There's a couple of problems with that: - It's common for a single Btrfs filesystem to have thousands of subvolumes. So keeping a superblock for each subvol in memory would get prohibively expensive - imagine having 8000 copies of struct super_block for a file system just because we wanted some separation of say, s_dev. - Writeback would also have to walk all of these superblocks - again not very good for system performance. - Anyone wanting to lock down I/O on a filesystem would have to freeze all the superblocks. This goes for most things related to I/O really - we simply can't afford to have the kernel walking thousands of superblocks to sync a single fs. It's far more efficient then to pull those fields we need for a subvolume namespace into their own structure. The following patches attempt to fix this issue by introducing a structure, fs_view, which can be used to represent a 'view' into a filesystem. We can migrate super_block fields to this structure one at a time. Struct super_block gets a default view embedded into it. Inodes get a new field, i_view, which can be dereferenced to get the view that an inode belgongs to. By default, we point i_view to the view on struct super_block. That way existing filesystems don't have to do anything different. The patches are careful not to grow the size of struct inode. For the first patch series, we migrate s_dev over from struct super_block to struct fs_view. This fixes a long standing bug in how the kernel reports inode devices to userspace. The series follows an order: - We first introduce the fs_view structure and embed it into struct super_block. As discussed, struct inode gets a pointer to the fs_view, i_view. The only member on fs_view at this point is a super_block * so that we can replace i_sb. A helper function is provided to get to the super_block from a struct inode. - Convert the kernel to using our helper function to get to i_sb. This is done on in a per-filesystem patch. The other parts of the kernel referencing i_sb get their changes batched up in logical groupings. - Move s_dev from struct super_block to struct fs_view. - Convert the kernel from inode->i_sb->s_dev to the device from our fs_view. In the end, these lines will look like inode_view(inode)->v_dev. - Add an fs_view struct to each Btrfs root, point inodes to that view when we initialize them. The patches are available via git and are based off Linux v4.16. There's two branches, with identical code. - With the inode_sb() changeover patch broken out (as is sent here): https://github.com/markfasheh/linux fs_view-broken-out - With the inode_sb() changeover patch in one big change: https://github.com/markfasheh/linux fs_view Comments are appreciated. Thanks, --Mark