Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2846100ybb; Mon, 30 Mar 2020 14:17:46 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvi4u4PDl1nT+PZ9jbVtiYCDczd2f4UrbYFCxS9j0W3sQ2uOB6MmN73WW9fCbcdSLVSA6vT X-Received: by 2002:aca:d503:: with SMTP id m3mr22285oig.165.1585603066665; Mon, 30 Mar 2020 14:17:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585603066; cv=none; d=google.com; s=arc-20160816; b=O2uSZsGV8izZwJs5OXMnUB0XEvuWQlqShR0XM5vB++uchqWzCJIKUyDo0gQO4vshZF /PTlGtVT/EBvmVcpHE4GpZCGkEiIq+jZRYuMr43KdLa0zeXfb6oy4fjF12wtQGANeehu 05ro2wJgA5v2YG7l7cO/i87zmaqg3Zl90+7/ZHNlmMh0EQtVyScggyeQrx4UzQngw8wu TMDBu4BET3IOa/7eOR66lc6aPOTtfSyu10mP6trTPZDudsD/LztBUh9ryeEatPslk5uc BcsZ5pxpCQTF8kxDMia99YYcvaMDWycdr5vUGWECuaG50AgrZjq2j3OaY8AcisaiN4EI cZyg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=EnafVRBNdZtZw4rQWIqNfoI5uDcKdXi6nQC8U5tBvK0=; b=lvSw16ckDh6J6v00Ta0DEKhgsNDZHqnBHGg7y0Wp/ojF59sP2mPH8obfyUCRElGwka B0KVf3yQps20yiKqH41aQ2v4OaLt+NBhwIga0RW53wBANF8jB6nARevf8tBnJ+UhlcQI SGk3MwIgDsijkzLEuZ8LY4moXUkczPAUM2XVMxo8blZe42tIPPaPHyIrwYYU6yz+X6np p6zacY9m1wIw97KcgDiE5LURuMoxw7JyUloZ0gWT3noDU8pTToN8sNMtd6BvcH9qAYwc XQigxKFZ0freyo7yoX5cmPJ4Hn2XCcUMsSIYIKs9v0C66EGD0i6Vmnkl6f7VqUst27mQ K5Sg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m12si6575677ooe.71.2020.03.30.14.17.33; Mon, 30 Mar 2020 14:17:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728407AbgC3VRO (ORCPT + 99 others); Mon, 30 Mar 2020 17:17:14 -0400 Received: from youngberry.canonical.com ([91.189.89.112]:46119 "EHLO youngberry.canonical.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728376AbgC3VRO (ORCPT ); Mon, 30 Mar 2020 17:17:14 -0400 Received: from ip5f5bf7ec.dynamic.kabel-deutschland.de ([95.91.247.236] helo=wittgenstein) by youngberry.canonical.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1jJ1mI-0008Ex-KZ; Mon, 30 Mar 2020 21:17:02 +0000 Date: Mon, 30 Mar 2020 23:17:00 +0200 From: Christian Brauner To: David Howells Cc: torvalds@linux-foundation.org, viro@zeniv.linux.org.uk, dray@redhat.com, kzak@redhat.com, mszeredi@redhat.com, swhiteho@redhat.com, jlayton@redhat.com, raven@themaw.net, andres@anarazel.de, keyrings@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, lennart@poettering.net, cyphar@cyphar.com Subject: Re: Upcoming: Notifications, FS notifications and fsinfo() Message-ID: <20200330211700.g7evnuvvjenq3fzm@wittgenstein> References: <1445647.1585576702@warthog.procyon.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1445647.1585576702@warthog.procyon.org.uk> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Cc Lennart and Aleksa, both of which maintain projects too that would make use of this] On Mon, Mar 30, 2020 at 02:58:22PM +0100, David Howells wrote: > > Hi Linus, > > I have three sets of patches I'd like to push your way, if you (and Al) are > willing to consider them. > > (1) General notification queue plus key/keyring notifications. > > This adds the core of the notification queue built on pipes, and adds > the ability to watch for changes to keys. > > (2) Mount and superblock notifications. > > This builds on (1) to provide notifications of mount topology changes > and implements a framework for superblock events (configuration > changes, I/O errors, quota/space overruns and network status changes). > > (3) Filesystem information retrieval. > > This provides an extensible way to retrieve informational attributes > about mount objects and filesystems. This includes providing > information intended to make recovering from a notification queue > overrun much easier. > > We need (1) for Gnome to efficiently watch for changes in kerberos > keyrings. Debarshi Ray has patches ready to go for gnome-online-accounts > so that it can make use of the facility. > > Sets (2) and (3) can make libmount more efficient. Karel Zak is working on > making use of this to avoid reading /proc/mountinfo. > > We need something to make systemd's watching of the mount topology more > efficient, and (2) and (3) can help with this by making it faster to narrow > down what changed. I think Karel has this in his sights, but hasn't yet > managed to work on it. > > Set (2) should be able to make it easier to watch for mount options inside > a container, and set (3) should make it easier to examine the mounts inside > another mount namespace inside a container in a way that can't be done with > /proc/mounts. This is requested by Christian Brauner. > > Jeff Layton has a tentative addition to (3) to expose error state to > userspace, and Andres Freund would like this for Postgres. > > Set (3) further allows the information returned by such as statx() and > ioctl(FS_IOC_GETFLAGS) to be qualified by indicating which bits are/aren't > supported. > > Further, for (3), I also allow filesystem-specific overrides/extensions to > fsinfo() and have a use for it to AFS to expose information about server > preference for a particular volume (something that is necessary for > implementing the toolset). I've provided example code that does similar > for NFS and some that exposes superblock info from Ext4. At Vault, Steve > expressed an interest in this for CIFS and Ted Ts'o expressed a possible > interest for Ext4. > > Notes: > > (*) These patches will conflict with apparently upcoming refactoring of > the security core, but the fixup doesn't look too bad: > > https://lore.kernel.org/linux-next/20200330130636.0846e394@canb.auug.org.au/T/#u > > (*) Miklós Szeredi would much prefer to implement fsinfo() as a magic > filesystem mounted on /proc/self/fsinfo/ whereby your open fds appear > as directories under there, each with a set of attribute files > corresponding to the attributes that fsinfo() would otherwise provide. > To examine something by filename, you'd have to open it O_PATH and > then read the individual attribute files in the corresponding per-fd > directory. A readfile() system call has been mooted to elide the > {open,read,close} sequence to make it more efficient. Fwiw, putting down my kernel hat and speaking as someone who maintains two container runtimes and various other low-level bits and pieces in userspace who'd make heavy use of this stuff I would prefer the fd-based fsinfo() approach especially in the light of across namespace operations, querying all properties of a mount atomically all-at-once, and safe delegation through fds. Another heavy user of this would be systemd (Cced Lennart who I've discussed this with) which would prefer the fd-based approach as well. I think pulling this into a filesystem and making userspace parse around in a filesystem tree to query mount information is the wrong approach and will get messy pretty quickly especially in the face of mount and user namespace interactions and various other pitfalls. fsinfo() fits quite nicely with the all-fd-based approach of the whole mount api. So yes, definitely preferred from my end. Christian