Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp6519608pxv; Thu, 29 Jul 2021 17:18:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJww/JkBWsXriG6NcQDDVreebWpXIgwMPWgcQ1BUoppfcCcqoFEvL84IoLbXQA940MkJTdEY X-Received: by 2002:a92:6a03:: with SMTP id f3mr5142924ilc.43.1627604317429; Thu, 29 Jul 2021 17:18:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627604317; cv=none; d=google.com; s=arc-20160816; b=evuuInrybd/4j/qKVVI2394vSROXhXQCQdRJBhcV9sNUHZNJEwRIx6muXvuwSB9qA1 cYiRuGwbUVxJqT9KFLLealLXfDTx6E1sxvP+Cycgbi4O3dvpQgOEBu7FyN4BXy8bdnaV ZjeaR4Ox5vdcvhvPDuJ9K7ySXt5lGVMTcPIcNmKPCAwo1S8LgrybuUSS52QP0LJvze5O NM/WLdWVARDG27G69XCKlWK2Mv7W8tTWB9OdvnRj6Oflp8tGExpEhOW2Pq3HH7vO43Sn RebYFjx2kJelwSKVR9ln21Cj7sFolf4OYFjGv680HwUvBaYwpqSBuCHO5LI5VeWMdTVu NDMw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=bPF4TVATBjc0AAqc0QnDGre9a1be3xtnJVxkM/L0aN0=; b=hejMK/+251cbqoWnrWiqgqV/C1pZnmG1y/O/51gNKqawRGzGjvffVfIW1y8lVTsVdM wygA0HBv8ncT4maPu5e0K5n+qkq5pvcKjxj7QwhsL+hauzVPLiZYgWooeO7RRpJw3Wjy QZEnrjrja39BUoVXyulIvGHwzXTEILbhGv/7EBx1O7+LbyXpIugbGk3OSuDJxuLQ3j/R mZNDVtmRU3ELBtKFVZpKs9+CS5FUn8Eg8N7l0U7RyrtrCHLcCXwXBPdmlP6ZVHUK1aXV NPk2duUcZWhvMqNYTwphwAzrkCuYG2Y7OwsbowX1L5B6lsuYKM71GxLiyHt6cGx3eyMb QAig== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id m1si5852129ilu.13.2021.07.29.17.18.12; Thu, 29 Jul 2021 17:18:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235324AbhG3ASO (ORCPT + 99 others); Thu, 29 Jul 2021 20:18:14 -0400 Received: from zeniv-ca.linux.org.uk ([142.44.231.140]:42500 "EHLO zeniv-ca.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235214AbhG3ASO (ORCPT ); Thu, 29 Jul 2021 20:18:14 -0400 Received: from viro by zeniv-ca.linux.org.uk with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1m9G9b-0052l3-6o; Fri, 30 Jul 2021 00:13:31 +0000 Date: Fri, 30 Jul 2021 00:13:31 +0000 From: Al Viro To: Josef Bacik Cc: "J. Bruce Fields" , NeilBrown , Christoph Hellwig , Chuck Lever , Chris Mason , David Sterba , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-btrfs@vger.kernel.org Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly Message-ID: References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <20210728193536.GD3152@fieldses.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: Al Viro Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Wed, Jul 28, 2021 at 05:30:04PM -0400, Josef Bacik wrote: > I don't think anybody has that many file systems. For btrfs it's a single > file system. Think of syncfs, it's going to walk through all of the super > blocks on the system calling ->sync_fs on each subvol superblock. Now this > isn't a huge deal, we could just have some flag that says "I'm not real" or > even just have anonymous superblocks that don't get added to the global > super_blocks list, and that would address my main pain points. Umm... Aren't the snapshots read-only by definition? > The second part is inode reclaim. Again this particular problem could be > avoided if we had an anonymous superblock that wasn't actually used, but the > inode lru is per superblock. Now with reclaim instead of walking all the > inodes, you're walking a bunch of super blocks and then walking the list of > inodes within those super blocks. You're burning CPU cycles because now > instead of getting big chunks of inodes to dispose, it's spread out across > many super blocks. > > The other weird thing is the way we apply pressure to shrinker systems. We > essentially say "try to evict X objects from your list", which means in this > case with lots of subvolumes we'd be evicting waaaaay more inodes than you > were before, likely impacting performance where you have workloads that have > lots of files open across many subvolumes (which is what FB does with it's > containers). > > If we want a anonymous superblock per subvolume then the only way it'll work > is if it's not actually tied into anything, and we still use the primary > super block for the whole file system. And if that's what we're going to do > what's the point of the super block exactly? This approach that Neil's come > up with seems like a reasonable solution to me. Christoph gets his > separation and /proc/self/mountinfo, and we avoid the scalability headache > of a billion super blocks. Thanks, AFAICS, we also get arseloads of weird corner cases - in particular, Neil's suggestions re visibility in /proc/mounts look rather arbitrary. Al, really disliking the entire series...