Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3240925ybf; Tue, 3 Mar 2020 02:15:11 -0800 (PST) X-Google-Smtp-Source: ADFU+vuxzhHjnmeYRDm/bsevD2xMOPW81i46e39H7QEaCUr/PYwgpMEe9Y0EO26TnQxN0Ii2FiD6 X-Received: by 2002:a9d:6:: with SMTP id 6mr2796904ota.191.1583230511159; Tue, 03 Mar 2020 02:15:11 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583230511; cv=none; d=google.com; s=arc-20160816; b=tVRHZlDBTADygox5l9n49lTMLX4PRXVu7r4RrVzDz78DEVJIAYFhmbUoW0daJX8fKL xxrFU5bZcWm3SW3Qr0n74yucjpI6C7WeT/X/T9Utfdk2CphaNH7w1Z8bASFI2IHizUgS /tTJQaMLELPdyVaWirDwJ7RJx+q5Y4CtF1DpzgGjeFndIYIz8hazmz86i17lKSuipsIw 8bbrp6AyfCQPLlnd6ADSXaFmOGwsgeWYiKesgY8qwUHrFIEX62kp1rFJ1tURBIGfxvzh Z5Tsjm4mitzyXRGi92nFBJA20hx1ZI8lTAXzTcaCA+oXkGdbYEkiybHZPpJl1uxzNdwV ymrw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=b2RBPOL2BrDy7cKCO1Q5lEU/UoBlA1lcG/mr7sg6aho=; b=xCTbMOsCN9gakTBsS7c/g0iD6PPaZ4H1v7oIbTxoRSg6xi57jAqT01Yk4XOsGeFQmt RT870u8CIMUeuLkCIjj3rGOq8Zqzch/zJq8Gi6ymp3JvWk5QQ0Dg2pXds6tSf6CohxCu tsecXxm+TgGICodov5X4XdEvmcFkB+5B1mN5w7Bpe8ibCW6To2rKrvmsa16S/IKFGFjK Hr1/XlZ6OXyZbZdq4hagxO1zmhreGvHlPsetzFhGkpZskjZ1HDcIMUYAwvBn2VaTviXF upCF+N5Ah2vvHnp3VytUOJiWPRVmV5VRNM3654Xv6wxA7UC4ku4BDxlCKIWcQDR+r33i AHuA== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=DGmhYfC5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v18si7962289oth.1.2020.03.03.02.14.59; Tue, 03 Mar 2020 02:15:11 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=DGmhYfC5; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728451AbgCCKOD (ORCPT + 99 others); Tue, 3 Mar 2020 05:14:03 -0500 Received: from mail-io1-f65.google.com ([209.85.166.65]:44321 "EHLO mail-io1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728102AbgCCKOC (ORCPT ); Tue, 3 Mar 2020 05:14:02 -0500 Received: by mail-io1-f65.google.com with SMTP id u17so2883658iog.11 for ; Tue, 03 Mar 2020 02:14:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=b2RBPOL2BrDy7cKCO1Q5lEU/UoBlA1lcG/mr7sg6aho=; b=DGmhYfC5yifUx3aXvV8ewsx6O+UEOBQA3YCM/g6FC5pPZCL9yDHh0H5nXhEGrlvABM 13rZWLc9/U//n7geEywqlCvA0G/75x1YfkBw95ZRiPc/tSiDPvb39cmStjXS8fjqkOqg EVmoHGDfiX+gCofo6bAPD7JZ3U7pKiqMYrLMU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=b2RBPOL2BrDy7cKCO1Q5lEU/UoBlA1lcG/mr7sg6aho=; b=P3vbEgxSsbaucu3i//qDf2PbA/zcRtqLcG+RCGzAb2fOcF9bZv+e8Jcca1EzxSAd5k YyPDZsMD9bdohqSHDYvVkkini5Z6H9QGqT+SkjVTmOPbKpLF1G5kbLDt6wkCDgHULP6g 9UMjGPE1jVFjzMK3UTre1XQSnIHjKWIaPmtwYcyWO3qbqzmsLefdUODaWnxXjNEePcIo VVmu83wvf03VxaZFK7meYT7p5l1XxfNxehY+Bnu9CDA4Ie5sd47Rgs30z4y46AEfyZR0 XCeOtULV7w8l83pyBz3bmfV4rlzIFaE0BQrjnIwVj1lO3Mojlf4pepNJis5WJ/b0372X lKog== X-Gm-Message-State: ANhLgQ3biJgh+U/DZon+p88+x3FA43VOzx1ieNcxOcHa5loXZizYyRXF GQZ9CFxR26wdXa4uYDDwQQ5l53negn1ZSvub7cl0gw== X-Received: by 2002:a6b:db08:: with SMTP id t8mr3190746ioc.285.1583230441732; Tue, 03 Mar 2020 02:14:01 -0800 (PST) MIME-Version: 1.0 References: <1582644535.3361.8.camel@HansenPartnership.com> <20200228155244.k4h4hz3dqhl7q7ks@wittgenstein> <107666.1582907766@warthog.procyon.org.uk> <0403cda7345e34c800eec8e2870a1917a8c07e5c.camel@themaw.net> <1509948.1583226773@warthog.procyon.org.uk> <20200303100045.zqntjjjv6npvs5zl@wittgenstein> In-Reply-To: <20200303100045.zqntjjjv6npvs5zl@wittgenstein> From: Miklos Szeredi Date: Tue, 3 Mar 2020 11:13:50 +0100 Message-ID: Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17] To: Christian Brauner Cc: David Howells , Ian Kent , James Bottomley , Steven Whitehouse , Miklos Szeredi , viro , Christian Brauner , Jann Horn , "Darrick J. Wong" , Linux API , linux-fsdevel , lkml , Greg Kroah-Hartman Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Mar 3, 2020 at 11:00 AM Christian Brauner wrote: > > On Tue, Mar 03, 2020 at 10:26:21AM +0100, Miklos Szeredi wrote: > > On Tue, Mar 3, 2020 at 10:13 AM David Howells wrote: > > > > > > Miklos Szeredi wrote: > > > > > > > I'm doing a patch. Let's see how it fares in the face of all these > > > > preconceptions. > > > > > > Don't forget the efficiency criterion. One reason for going with fsinfo(2) is > > > that scanning /proc/mounts when there are a lot of mounts in the system is > > > slow (not to mention the global lock that is held during the read). > > > > > > Now, going with sysfs files on top of procfs links might avoid the global > > > lock, and you can avoid rereading the options string if you export a change > > > notification, but you're going to end up injecting a whole lot of pathwalk > > > latency into the system. > > > > Completely irrelevant. Cached lookup is so much optimized, that you > > won't be able to see any of it. > > > > No, I don't think this is going to be a performance issue at all, but > > if anything we could introduce a syscall > > > > ssize_t readfile(int dfd, const char *path, char *buf, size_t > > bufsize, int flags); > > > > that is basically the equivalent of open + read + close, or even a > > vectored variant that reads multiple files. But that's off topic > > again, since I don't think there's going to be any performance issue > > even with plain I/O syscalls. > > > > > > > > On top of that, it isn't going to help with the case that I'm working towards > > > implementing where a container manager can monitor for mounts taking place > > > inside the container and supervise them. What I'm proposing is that during > > > the action phase (eg. FSCONFIG_CMD_CREATE), fsconfig() would hand an fd > > > referring to the context under construction to the manager, which would then > > > be able to call fsinfo() to query it and fsconfig() to adjust it, reject it or > > > permit it. Something like: > > > > > > fd = receive_context_to_supervise(); > > > struct fsinfo_params params = { > > > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT, > > > .request = FSINFO_ATTR_SB_OPTIONS, > > > }; > > > fsinfo(fd, NULL, ¶ms, sizeof(params), buffer, sizeof(buffer)); > > > supervise_parameters(buffer); > > > fsconfig(fd, FSCONFIG_SET_FLAG, "hard", NULL, 0); > > > fsconfig(fd, FSCONFIG_SET_STRING, "vers", "4.2", 0); > > > fsconfig(fd, FSCONFIG_CMD_SUPERVISE_CREATE, NULL, NULL, 0); > > > struct fsinfo_params params = { > > > .flags = FSINFO_FLAGS_QUERY_FSCONTEXT, > > > .request = FSINFO_ATTR_SB_NOTIFICATIONS, > > > }; > > > struct fsinfo_sb_notifications sbnotify; > > > fsinfo(fd, NULL, ¶ms, sizeof(params), &sbnotify, sizeof(sbnotify)); > > > watch_super(fd, "", AT_EMPTY_PATH, watch_fd, 0x03); > > > fsconfig(fd, FSCONFIG_CMD_SUPERVISE_PERMIT, NULL, NULL, 0); > > > close(fd); > > > > > > However, the supervised mount may be happening in a completely different set > > > of namespaces, in which case the supervisor presumably wouldn't be able to see > > > the links in procfs and the relevant portions of sysfs. > > > > It would be a "jump" link to the otherwise invisible directory. > > More magic links to beam you around sounds like a bad idea. We had a > bunch of CVEs around them in containers and they were one of the major > reasons behind us pushing for openat2(). That's why it has a > RESOLVE_NO_MAGICLINKS flag. No, that link wouldn't beam you around at all, it would end up in an internally mounted instance of a mountfs, a safe place where no dangerous CVE's roam. Thanks, Miklos