Received: by 2002:a25:d783:0:0:0:0:0 with SMTP id o125csp604209ybg; Thu, 19 Mar 2020 05:38:10 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvaRzGRC2D9eI1e8a639sJwkTknPdefdRKu5BoceoR+7Y8TsvYr8KP7jK5D8R5UEbEv1wGG X-Received: by 2002:a9d:2c64:: with SMTP id f91mr2193684otb.17.1584621490802; Thu, 19 Mar 2020 05:38:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584621490; cv=none; d=google.com; s=arc-20160816; b=CFT+DSQ9d5IQpvONjAG/XWuT6aBwqcFF+0bBBmxmxrL34BOZ93YrkXSOMBf1GbRwcY 8cjIG0xmKhRhiKh2+GExKkhg+IiS16Iiwv4pSRvV97fXlyljog5zJ0vaH8TfCKBQVSIH I44rDBUysAK/xB/0TJnoG2X6Ch0AxU9CSNCLuWYUcQxZ5NPT6Yxa45Z56DoBQpz+ddnu Y1MzhM3Mtq5I/G1anSb/KwbmQjwVoV8Y/m/9fkoSv3Z+3VatjYSvX3hFPlFbDjd2H5oS nnKj4wvs7f/NZjCR8JY0dK0J1PD3QUbGJfxMPfnLF4XVlnuiZeJGi/578ZoZTK7Iihh+ h+bA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=2tuVxH0JdqTpL/q4Mhu8Jwtx1AVVUn0AWeX5OBSt298=; b=t6z9o1xEUiAs5QT/cYrOZERX4e6SNtlEfczQB0jsZJ9/otIS+NWkujh6MjzM6HVJHE SzxKKBiDT3cspBR0thRi/MhuUFaMDOolHvg9fLkAEmDntSQlW63OOtdxYk5JAXScZO7R dyqQ3x9G2TuAEQsklCd5xdxEmraAEeRZeYoxe+DFjxSm+0ySpm2seBq+gzVgSNp/l/H4 5RG1advi9VZ+Bgsg7sDeyYSilAhqkC/HmCX5eO1NcIAbpbvJyKukshKbs7ofqZCa3REG +5ZASiHSjgkj5zM5iX+kacdUO5CL1TYLZI7gE8ZkVy/1aiq05TgwtS/Co99bOg//ie2I qjOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=cMtY3qUc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r19si1204845otg.54.2020.03.19.05.37.58; Thu, 19 Mar 2020 05:38:10 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=temperror (no key for signature) header.i=@szeredi.hu header.s=google header.b=cMtY3qUc; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727154AbgCSMhN (ORCPT + 99 others); Thu, 19 Mar 2020 08:37:13 -0400 Received: from mail-il1-f194.google.com ([209.85.166.194]:40462 "EHLO mail-il1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727114AbgCSMhL (ORCPT ); Thu, 19 Mar 2020 08:37:11 -0400 Received: by mail-il1-f194.google.com with SMTP id p12so2040882ilm.7 for ; Thu, 19 Mar 2020 05:37:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=2tuVxH0JdqTpL/q4Mhu8Jwtx1AVVUn0AWeX5OBSt298=; b=cMtY3qUcEmiCWPAEeCd5lhB7x3Biy4/6Het2lKK01B7wcpZG2RZMmWE3LV0gVxGIei UuMBMRql9WM+pRx9p4acPSg0mo4kZ7BHmdSm48TJO0cwwlRT82C+TUnt1Ixfd5tAZfQB qGlXYFxfI9xkqU5Rq8m22vi2BvEMfsNiNnE2E= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=2tuVxH0JdqTpL/q4Mhu8Jwtx1AVVUn0AWeX5OBSt298=; b=ngbAEf4xWw6sRBrrlva192alMXxfsffiAkZHJGVKRxxOxcQdmuGR2EYaUryN+5XIef XzK+F2s3ZOHBs+/ykyJh8ofwBJcHBvQ7/E23mj9RASIt/XskT3RhBxCWgAqXyiIyH+WY BDl42Xf01E8n8c4iPlp4Yn+uQO+yzxIU7nF7resI5o9pv4CXggM0m5fv7Q2WKBtAcsjG 8k/snOiHjQLRvVo1Q6FbjblfYd7kH+L5DxUA4Hp6BTK6ccOvtwq2leJcADyiA1cWjIo5 ADnFLcHpCcRx+6PznXlXv6i/7dvF+wCgaHmbQM5uwZhHDEOmPfALHrCuKzPGwfzMBryq vC9A== X-Gm-Message-State: ANhLgQ3KNya8GZwKTRq/op9SVkoF9mTfsRGKJE5wkVN18diwZWhu42Dc qe8UuiHwaOYP+ptwuy8sQdOzjg/eds3OTtcOZyQ64A== X-Received: by 2002:a92:3b8c:: with SMTP id n12mr2899150ilh.186.1584621429946; Thu, 19 Mar 2020 05:37:09 -0700 (PDT) MIME-Version: 1.0 References: <158454408854.2864823.5910520544515668590.stgit@warthog.procyon.org.uk> <3085880.1584614257@warthog.procyon.org.uk> In-Reply-To: <3085880.1584614257@warthog.procyon.org.uk> From: Miklos Szeredi Date: Thu, 19 Mar 2020 13:36:58 +0100 Message-ID: Subject: Re: [PATCH 00/13] VFS: Filesystem information [ver #19] To: David Howells Cc: Linus Torvalds , Al Viro , Linux NFS list , Andreas Dilger , Anna Schumaker , "Theodore Ts'o" , Linux API , linux-ext4@vger.kernel.org, Trond Myklebust , Ian Kent , Miklos Szeredi , Christian Brauner , Jann Horn , "Darrick J. Wong" , Karel Zak , Jeff Layton , linux-fsdevel@vger.kernel.org, LSM , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 19, 2020 at 11:37 AM David Howells wrote: > > Miklos Szeredi wrote: > > > > (2) It's more efficient as we can return specific binary data rather than > > > making huge text dumps. Granted, sysfs and procfs could present the > > > same data, though as lots of little files which have to be > > > individually opened, read, closed and parsed. > > > > Asked this a number of times, but you haven't answered yet: what > > application would require such a high efficiency? > > Low efficiency means more time doing this when that time could be spent doing > other things - or even putting the CPU in a powersaving state. Using an > open/read/close render-to-text-and-parse interface *will* be slower and less > efficient as there are more things you have to do to use it. > > Then consider doing a walk over all the mounts in the case where there are > 10000 of them - we have issues with /proc/mounts for such. fsinfo() will end > up doing a lot less work. Current /proc/mounts problems arise from the fact that mount info can only be queried for the whole namespace, and hence changes related to a single mount will require rescanning the complete mount list. If mount info can be queried for individual mounts, then the need to scan the complete list will be rare. That's *the* point of this change. > > > (3) We wouldn't have the overhead of open and close (even adding a > > > self-contained readfile() syscall has to do that internally > > > > Busted: add f_op->readfile() and be done with all that. For example > > DEFINE_SHOW_ATTRIBUTE() could be trivially moved to that interface. > > Look at your example. "f_op->". That's "file->f_op->" I presume. > > You would have to make it "i_op->" to avoid the open and the close - and for > things like procfs and sysfs, that's probably entirely reasonable - but bear > in mind that you still have to apply all the LSM file security controls, just > in case the backing filesystem is, say, ext4 rather than procfs. > > > We could optimize existing proc, sys, etc. interfaces, but it's not > > been an issue, apparently. > > You can't get rid of or change many of the existing interfaces. A lot of them > are effectively indirect system calls and are, as such, part of the fixed > UAPI. You'd have to add a parallel optimised set. Sure. We already have the single_open() internal API that is basically a ->readfile() wrapper. Moving this up to the f_op level (no, it's not an i_op, and yes, we do need struct file, but it can be simply allocated on the stack) is a trivial optimization that would let a readfile(2) syscall access that level. No new complexity in that case. Same generally goes for seq_file: seq_readfile() is trivial to implement without messing with current implementation or any existing APIs. > > > > (6) Don't have to create/delete a bunch of sysfs/procfs nodes each time a > > > mount happens or is removed - and since systemd makes much use of > > > mount namespaces and mount propagation, this will create a lot of > > > nodes. > > > > Not true. > > This may not be true if you roll your own special filesystem. It *is* true if > you do it in procfs or sysfs. The files don't exist if you don't create nodes > or attribute tables for them. That's one of the reasons why I opted to roll my own. But the ideas therein could be applied to kernfs, if found to be generally useful. Nothing magic about that. > > > > The argument for doing this through procfs/sysfs/somemagicfs is that > > > someone using a shell can just query the magic files using ordinary text > > > tools, such as cat - and that has merit - but it doesn't solve the > > > query-by-pathname problem. > > > > > > The suggested way around the query-by-pathname problem is to open the > > > target file O_PATH and then look in a magic directory under procfs > > > corresponding to the fd number to see a set of attribute files[*] laid out. > > > Bash, however, can't open by O_PATH or O_NOFOLLOW as things stand... > > > > Bash doesn't have fsinfo(2) either, so that's not really a good argument. > > I never claimed that fsinfo() could be accessed directly from the shell. For > you proposal, you claimed "immediately usable from all programming languages, > including scripts". You are right. Note however: only special files need the O_PATH handling, regular files are directories can be opened by the shell without side effects. In any case, I think neither of us can be convinced of the other's right, so I guess It's up to Al and Linus to make a decision. Thanks, Miklos