Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp799648ybf; Fri, 28 Feb 2020 07:53:16 -0800 (PST) X-Google-Smtp-Source: APXvYqylWCRSkJnwGTCD+xYj/QsQL/x42i7FM19HyV3TN0ZmmZVMud4CfoGn42x116vnCAT52UZi X-Received: by 2002:a9d:48d:: with SMTP id 13mr3627992otm.249.1582905196625; Fri, 28 Feb 2020 07:53:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1582905196; cv=none; d=google.com; s=arc-20160816; b=wzBY6LWTfZiNUCXujjvVZ5Qa4m5oxk14hBTrqCwN6n4Dtm1BaX03R1YqUHGideDvru ir5U4MtEGvcTfIfHTO6CTbTdtEBdy9sdSNEhWLzGwzgYxcK9SH3fqoPaEKESmyfAbhDa 0pbybwCcqB9N6TbIJRDgLTqM/WlUIGOn0Mr9NrTMVh6VtBYPGMy8nqDnNIaehqwuv2eK 8fPj05h4aZqkx/t5dXodpowwK6S4IEyuRYtMuLGIw0Ov7JyfcAx9ierEkjgXk9u2Z6/m POf0Om5JviE6eY4qx45fZ7a9Eboys8HwFjknbjEllV//bSc1z4K7Ctzi9q01c01TG1Cl 3C0A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=bnoMIwylYQwG2nsP2Et4qwKLIR+iSLoIi71ML2pEnEE=; b=uy2zbPaeOcJg7Zu28BJALjZeObTNR/3ZNTku1pEcp/Q3Iy9z00EHTu9eLFhiRpBeiw NXnIZd041JpJYZMVk7p+TAXukSnd7HUCrg0IJTjhOuW9D2C6u3ifbviAzhJBXxC6vEo4 V9jMvaH0cc/G1DHz1Kt0tl6Eoi3XpV91VXWnuEiLl6GMXaje/19mHWb9GJSq40FhB/qX MXS5tT3eqpNYTymq6hBrmponFlcL7ieGZ3YEhQpER1nTOFWJLNJSGL1BOOanH1xFR6tr /43VH4Dbb/K09PxBhhqsoQWzFKXlP0LDDtN038wfavGM1s5NgCKrdw8ATA0KNnNQqydL EtHg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 193si2095071oie.51.2020.02.28.07.53.04; Fri, 28 Feb 2020 07:53:16 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727102AbgB1Pw6 (ORCPT + 99 others); Fri, 28 Feb 2020 10:52:58 -0500 Received: from youngberry.canonical.com ([91.189.89.112]:54444 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726956AbgB1Pw6 (ORCPT ); Fri, 28 Feb 2020 10:52:58 -0500 Received: from ip5f5bf7ec.dynamic.kabel-deutschland.de ([95.91.247.236] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1j7hwT-0008I1-Db; Fri, 28 Feb 2020 15:52:45 +0000 Date: Fri, 28 Feb 2020 16:52:44 +0100 From: Christian Brauner To: James Bottomley Cc: Steven Whitehouse , Miklos Szeredi , Miklos Szeredi , David Howells , viro , Ian Kent , Christian Brauner , Jann Horn , "Darrick J. Wong" , Linux API , linux-fsdevel , lkml Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17] Message-ID: <20200228155244.k4h4hz3dqhl7q7ks@wittgenstein> References: <158230810644.2185128.16726948836367716086.stgit@warthog.procyon.org.uk> <1582316494.3376.45.camel@HansenPartnership.com> <1582556135.3384.4.camel@HansenPartnership.com> <1582644535.3361.8.camel@HansenPartnership.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1582644535.3361.8.camel@HansenPartnership.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 25, 2020 at 07:28:55AM -0800, James Bottomley wrote: > On Tue, 2020-02-25 at 12:13 +0000, Steven Whitehouse wrote: > > Hi, > > > > On 24/02/2020 15:28, Miklos Szeredi wrote: > > > On Mon, Feb 24, 2020 at 3:55 PM James Bottomley > > > wrote: > > > > > > > Once it's table driven, certainly a sysfs directory becomes > > > > possible. The problem with ST_DEV is filesystems like btrfs and > > > > xfs that may have multiple devices. > > > > > > For XFS there's always a single sb->s_dev though, that's what > > > st_dev will be set to on all files. > > > > > > Btrfs subvolume is sort of a lightweight superblock, so basically > > > all such st_dev's are aliases of the same master superblock. So > > > lookup of all subvolume st_dev's could result in referencing the > > > same underlying struct super_block (just like /proc/$PID will > > > reference the same underlying task group regardless of which of the > > > task group member's PID is used). > > > > > > Having this info in sysfs would spare us a number of issues that a > > > set of new syscalls would bring. The question is, would that be > > > enough, or is there a reason that sysfs can't be used to present > > > the various filesystem related information that fsinfo is supposed > > > to present? > > > > > > Thanks, > > > Miklos > > > > > > > We need a unique id for superblocks anyway. I had wondered about > > using s_dev some time back, but for the reasons mentioned earlier in > > this thread I think it might just land up being confusing and > > difficult to manage. While fake s_devs are created for sbs that don't > > have a device, I can't help thinking that something closer to > > ifindex, but for superblocks, is needed here. That would avoid the > > issue of which device number to use. > > > > In fact we need that anyway for the notifications, since without > > that there is a race that can lead to missing remounts of the same > > device, in case a umount/mount pair is missed due to an overrun, and > > then fsinfo returns the same device as before, with potentially the > > same mount options too. So I think a unique id for a superblock is a > > generically useful feature, which would also allow for sensible sysfs > > directory naming, if required, > > But would this be informative and useful for the user? I'm sure we can > find a persistent id for a persistent superblock, but what about tmpfs > ... that's going to have to change with every reboot. It's going to be > remarkably inconvenient if I want to get fsinfo on /run to have to keep > finding what the id is. > > The other thing a file descriptor does that sysfs doesn't is that it > solves the information leak: if I'm in a mount namespace that has no > access to certain mounts, I can't fspick them and thus I can't see the > information. By default, with sysfs I can. Difficult to figure out which part of the thread to reply too. :) sysfs strikes me as fundamentally misguided for this task. Init systems or any large-scale daemon will hate parsing things, there's that and parts of the reason why mountinfo sucks is because of parsing a possibly a potentially enormous file. Exposing information in sysfs will require parsing again one way or the other. I've been discussing these bottlenecks with Lennart quite a bit and reliable and performant mount notifications without needing to parse stuff is very high on the issue list. But even if that isn't an issue for some reason the namespace aspect is definitely something I'd consider a no-go. James has been poking at this a little already and I agree. More specifically, sysfs and proc already are a security nightmare for namespace-aware workloads and require special care. Not leaking information in any way is a difficult task. I mean, over the last two years I sent quite a lot of patches to the networking-namespace aware part of sysfs alone either fixing information leaks, or making other parts namespace aware that weren't and were causing issues (There's another large-ish series sitting in Dave's tree right now.). And tbh, network namespacing in sysfs is imho trivial compared to what we would need to do to handle mount namespacing and especially mount propagation. fsinfo() is way cleaner and ultimately simpler approach. We very much want it file-descriptor based. The mount api opens up the road to secure and _delegatable_ querying of filesystem information. Christian