Received: by 2002:a05:6a10:1287:0:0:0:0 with SMTP id d7csp5587931pxv; Wed, 28 Jul 2021 14:30:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxpou7mtamGmQZ+3iZTeon4827FHlU7xv6Aikp90zU3GG/YDL0Lcu7RFdyHEbaf7ozKUblG X-Received: by 2002:a02:85a5:: with SMTP id d34mr1536779jai.132.1627507841505; Wed, 28 Jul 2021 14:30:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1627507841; cv=none; d=google.com; s=arc-20160816; b=t88eQZVEzTnOCwQ61EBlmqfuhe3Hn+4dN8Tw5HhHOkBFRqVRSUvWz0wtz0JLH8drD0 H1QhuBmtPSdgIcjaQnaSvHU618RmTgvQStyP+6/WDythPZGfvI0pK8uroOWv/N2556fu LZIZRUAoRSfAFDQMYrhW4rhyYgK2jleW9dO331mHUvNHa5PxxukjwCzwjbEjyr7rX7vb 8XBaZacVzhV4KdNevEPQ8qwco5wKKZmhSV3krUMdsLuCMXfhZWlVoEUfdTnkD6rVh56f uR15D525gxi6hkfd/HoR36tJKDe0GUJMMbkYQ0JNjCfWehRZavsCxMdoAYLl65NJlv+5 u8Mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=uVT1uFTAOGfK1fP5fzGaiByMshUd21P5nAY5t0mAWDw=; b=t9szP8nxbj6iDGxCF0KDNViOpxDOv7nzOzI4I03OSiDma5YQKW8r+4CgHPtRqzOeul F+RdDfCI/iTqLRuX2Yix/9J/OYsgh1T3t1Mpnw189xHkOtQnBwAXs3RXzxx2QVhks/rs zqAp/9vGZgeaMDk+YzO3gHxaxDs15zb6M03Cony/YkgpIST1A5zg3dp89JMkUdqyV6s8 lA4nV/ScKtG+kRrOrHmXMaeEEemn9zS1Y79Cu98Jh9ruwPt44SXhlqnqIVk/yyiGIy85 RjkT1h4K5vz485V6iLKtvI/zKYfkbexXjuwtxzZmbj/XbwbvUqHI7V7fzA6vNxoeHQCU Yg+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=o4pY+ugB; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n23si1084092ion.23.2021.07.28.14.30.16; Wed, 28 Jul 2021 14:30:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@toxicpanda-com.20150623.gappssmtp.com header.s=20150623 header.b=o4pY+ugB; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231585AbhG1VaJ (ORCPT + 99 others); Wed, 28 Jul 2021 17:30:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231520AbhG1VaJ (ORCPT ); Wed, 28 Jul 2021 17:30:09 -0400 Received: from mail-qk1-x72f.google.com (mail-qk1-x72f.google.com [IPv6:2607:f8b0:4864:20::72f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C639C061757 for ; Wed, 28 Jul 2021 14:30:07 -0700 (PDT) Received: by mail-qk1-x72f.google.com with SMTP id az7so3750849qkb.5 for ; Wed, 28 Jul 2021 14:30:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=toxicpanda-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=uVT1uFTAOGfK1fP5fzGaiByMshUd21P5nAY5t0mAWDw=; b=o4pY+ugBDJRzrhgcaJYE3XIJZ/sZ/DL9/abDHd9cjt+SdfzNT7ZItYIrHzt1ZzUKGW jPk87LcV9EbN29AUKteWvNlZ5PAZnZtNQCelTIIKjrsxHVgc61EI9m8JqbsFMDhVJ9Wt Ljo/lFqcApepJWx9bGlw2a4ErY9seplPEw2vq3gG87Yy42jzl4bXSKyN+zBirqcZucui qtc1pEWIC9s5EgdRSq9Lytn56zd+OUiePpmSYmCg5RD8LPI1zLZ8Cr4FMrZUNlxvAWyF LqFeon22h5DzRAxin62tbC7hDBDDN7TLbSEejSqSYxapDjDv+uvHv9J+cI5SiHaF7UGS VTvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=uVT1uFTAOGfK1fP5fzGaiByMshUd21P5nAY5t0mAWDw=; b=FnAxEOmKrzwKFT6TBhYg+I3aeRLlGBtBlpYBfDWlYGbXFF2iW7fXllE5f0Gi9NejAh NIvOXTjwLpIdi9pvr8rBRGxwngXJSdfdT4HcUNO3J+07BzHcgn1+fMetxtNFhFGilOme GXL0V2KPkNcQbcF3I7UAqkjq+KWqqbutXE0fvjnR0qtZ8Lv2+y7mUMPW79fEmHNmBk19 zpYtxfv5zS9Z3xPUzTuBdFjZ4FewE/28IQmRxCFXbjgRa6huEjsGa24v5a1QkaNbX+EY us8kWuCN2kx9WYhhljy//11tzDNd0wqmzwoXraSXlSgOaEyfWVVfoiHw2iYugm+QUaFb EN7A== X-Gm-Message-State: AOAM530Qo0YlFH6WHqZp90H1Y0PGmCdPXapbnsNNyRfojU8tp5gefKsC IHhCEFfzkioi2XEq8v2NB+y1Cg== X-Received: by 2002:a37:9b44:: with SMTP id d65mr1729724qke.71.1627507806155; Wed, 28 Jul 2021 14:30:06 -0700 (PDT) Received: from [192.168.1.45] (cpe-174-109-172-136.nc.res.rr.com. [174.109.172.136]) by smtp.gmail.com with ESMTPSA id u36sm455507qtc.71.2021.07.28.14.30.05 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 28 Jul 2021 14:30:05 -0700 (PDT) Subject: Re: [PATCH/RFC 00/11] expose btrfs subvols in mount table correctly To: "J. Bruce Fields" , NeilBrown Cc: Christoph Hellwig , Chuck Lever , Chris Mason , David Sterba , Alexander Viro , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-btrfs@vger.kernel.org References: <162742539595.32498.13687924366155737575.stgit@noble.brown> <20210728193536.GD3152@fieldses.org> From: Josef Bacik Message-ID: Date: Wed, 28 Jul 2021 17:30:04 -0400 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: <20210728193536.GD3152@fieldses.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 7/28/21 3:35 PM, J. Bruce Fields wrote: > I'm still stuck trying to understand why subvolumes can't get their own > superblocks: > > - Why are the performance issues Josef raises unsurmountable? > And why are they unique to btrfs? (Surely there other cases > where people need hundreds or thousands of superblocks?) > I don't think anybody has that many file systems. For btrfs it's a single file system. Think of syncfs, it's going to walk through all of the super blocks on the system calling ->sync_fs on each subvol superblock. Now this isn't a huge deal, we could just have some flag that says "I'm not real" or even just have anonymous superblocks that don't get added to the global super_blocks list, and that would address my main pain points. The second part is inode reclaim. Again this particular problem could be avoided if we had an anonymous superblock that wasn't actually used, but the inode lru is per superblock. Now with reclaim instead of walking all the inodes, you're walking a bunch of super blocks and then walking the list of inodes within those super blocks. You're burning CPU cycles because now instead of getting big chunks of inodes to dispose, it's spread out across many super blocks. The other weird thing is the way we apply pressure to shrinker systems. We essentially say "try to evict X objects from your list", which means in this case with lots of subvolumes we'd be evicting waaaaay more inodes than you were before, likely impacting performance where you have workloads that have lots of files open across many subvolumes (which is what FB does with it's containers). If we want a anonymous superblock per subvolume then the only way it'll work is if it's not actually tied into anything, and we still use the primary super block for the whole file system. And if that's what we're going to do what's the point of the super block exactly? This approach that Neil's come up with seems like a reasonable solution to me. Christoph gets his separation and /proc/self/mountinfo, and we avoid the scalability headache of a billion super blocks. Thanks, Josef