Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1967235imm; Thu, 12 Jul 2018 10:45:02 -0700 (PDT) X-Google-Smtp-Source: AAOMgped9uaCFX3uyB9WL9olDAQO8/XWAW+VjgddcXZka0vITvVL1o+TFxoP5qUD3MflXItgFYLf X-Received: by 2002:aa7:8118:: with SMTP id b24-v6mr3453862pfi.78.1531417502764; Thu, 12 Jul 2018 10:45:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1531417502; cv=none; d=google.com; s=arc-20160816; b=QrelNE8o+N7ZSdPfZ8s+R88jCWfk/khMLdE/3MEGwEwe9S0Rz/8IsvM5TNZFNxCojY Uo0fsrroT7knYpZUkwXFz/nVdEyH5SatdQ0rJka0vWsdEI/YoG1Ta9Q1PQoBUhGFzOXj f+4ThwKhfLtor7ThjDNE/R0Ba1oTaQ5YU65Nlptw+ba2d0W84oMaTczDKJeVP4O4JrYj yFIUnw9L62neN9gcqPZ0RkRDoefIT+/cKTWBK2NSLs3jdieHAYNrL7IohsV6iGHkzzjY TWqpaQ/ElekIMrXHd6xlz4BSx1ZJ1A3TtpPfWBZZE7PX1w8wk00M01Zi1Hm8wkw1gTcy r5sQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=ZmL8j6a9vCUOVulVtOl38MQzkOcvLHqIM8oMlUNN+vc=; b=fiDNOpTE5NHIMIaM2pIPZUxekYgVaPwzgk+cKrrKXrQNp5yLWNwxkTlgM+sL3c8LqD vtTpyZy6fWGFCdpICSwOJvU3mDjRvkxcptdTgcSuJnZxwqLIfK7ZF3fA8pt9cagw9Qb9 3b83RpzMu/PZC2hhy3VaVtKCA9aMZQ8rReekJL9Cior2XOzIwhV5Se060+NvZUnxEjXW rnAjc07Q/BWDdZvb13PcKaJuvuDwNm69Vl6AAiV1YxV007XDw5V8eWp3bR5qJrpViGzs vpmIlS/GxTNQWvxgT8Yz3H6tUM9SlYzCtVQWxa+Cwx4gS4EIVJFmOsLFIlfhaymlgQJr yYKA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b13-v6si20425777pgw.478.2018.07.12.10.44.45; Thu, 12 Jul 2018 10:45:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726561AbeGLRyn (ORCPT + 99 others); Thu, 12 Jul 2018 13:54:43 -0400 Received: from zeniv.linux.org.uk ([195.92.253.2]:42694 "EHLO ZenIV.linux.org.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726416AbeGLRym (ORCPT ); Thu, 12 Jul 2018 13:54:42 -0400 Received: from viro by ZenIV.linux.org.uk with local (Exim 4.87 #1 (Red Hat Linux)) id 1fdfdL-0002x6-FW; Thu, 12 Jul 2018 17:44:03 +0000 Date: Thu, 12 Jul 2018 18:44:03 +0100 From: Al Viro To: Linus Torvalds Cc: David Howells , Andrew Lutomirski , Linux API , linux-fsdevel , Linux Kernel Mailing List , Jann Horn Subject: Re: [PATCH 24/32] vfs: syscall: Add fsopen() to prepare for superblock creation [ver #9] Message-ID: <20180712174403.GA30522@ZenIV.linux.org.uk> References: <686E805C-81F3-43D0-A096-50C644C57EE3@amacapital.net> <22370.1531293761@warthog.procyon.org.uk> <7002.1531407244@warthog.procyon.org.uk> <20180712160030.GV30522@ZenIV.linux.org.uk> <20180712163107.GW30522@ZenIV.linux.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 12, 2018 at 10:14:05AM -0700, Linus Torvalds wrote: > On Thu, Jul 12, 2018 at 9:39 AM Linus Torvalds > wrote: > > > > I agree that a system call is likely saner. Especially since we'd have > > one to _start_ this (ie "fsopen()") it would make sense to have the > > one to finalize it. > > Side note: if we can make do with just a buffer, then we wouldn't need > "fsopen()". You could literally just open a pipe, and write to it. > It's got 16 pages worth of buffers by default, and you can increase it > (within reason) as root. > > Of course, depending on IO patterns, not all the buffer pages are > necessarily fully used, so it's not like you get a buffer of size > PAGE_SIZE*16, but we do merge buffers so you should be fairly close. > > Then you really could do without a fsopen(). Just fill a pipe with > data, and do "fsmount()" on the pipe contents. > > Added upside? You can use "iov_iter_pipe()" to iterate over all that data. > > I'm only half joking. One semi-historical note here. Originally, mount(2) (and it had been there since v1) had only one filesystem type to deal with. So it was really just "mount on , read-only or read-write". 3 arguments, two strings and one flag (flag, BTW, was a later addition). It didn't last. I can dig out the archaeological notes and cut'n'paste the whole horror story here, but that'll be way too long and scary. By 4.2BSD times there had been essentially an enum encoding the filesystem type and type-tagged union of structs with type-dependent options. Plus some options taking more bits in what used to be "is it r/w?" flag. Leaving aside the whole "mount new/bind/remount/etc." overloading we have in mount(2) today, we have a bunch of named filesystems, each with its own set of options. Device name has ceased to be something special for many decades; the type name is what's universally present and that's what decides how the rest (including "device name") is to be interpreted. Fundamentally, we start with selecting (by name) a filesystem driver we'll be talking to. The rest (device name + string options + flags like noexec that are not handled on VFS level) is given to that driver, which either tells us to take a hike or gives us a dentry tree that can be attached. Separating type name from everything else makes a lot of sense, simply because it's what determines the parsing and interpretation of the rest. Speaking of half-joking, I suggested AF_FSTYPE at some point. Then fsopen(2) would be connect(2)... I think that having that (connection used to talk to fs driver, with or without an already set up fs instance we are talking about) as first-class object makes sense. That's completely unrelated to the question of buffering, of course.