Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp3828506ybf; Tue, 3 Mar 2020 13:41:10 -0800 (PST) X-Google-Smtp-Source: ADFU+vub6IM78VTjsAzH3jciCPV5B2S6weeYBSBqvNhKd5q0PBH3iNeITDI8V1iOmoyaYVlmkye+ X-Received: by 2002:aca:5d57:: with SMTP id r84mr446107oib.42.1583271670365; Tue, 03 Mar 2020 13:41:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583271670; cv=none; d=google.com; s=arc-20160816; b=fGco4Sbxsme6gn24DdbGHh3hV4cMef7+LJD3u2+//3ydlumppDt4zbOOYTlYkFQrAG bRuWMjTScYEjrShRqvA/J739MbwrAcWG9CWxXYjO+MEUr0t3fPr5aa2HEpGFffaGORca nhiR2AdW0pCsDo5a2pVr+MW7M0hLP6GYTym4lmFwOLDTdQwqfY3wQO+v7Ri+eADXpwy1 pE19tYjrQlRiC4u6Na2pAXR0Q7oa8TChf5cvoFuh5DbrbadM/aMShbRGZwIas6vtZQN6 RLTKpnxA7rzNbhO0NsAuD+xPCVbjtWgOAzARRsKhodNrW+yzOfgYWHjfw5dUu+xrI5GD MFHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature; bh=zXPZ9A67qwmOg7igLgXKoR5AdobcYJckvv+S7pUuvQY=; b=fa4EwG6u0gACfKLJl7KjzTffMziHiZMIDDYMHAjDCssDkD1hbMORyWs/dDAVopHRQk xP5I2pXjC/zby1HmisiW9V1LvoqOA2aHbJe64jLO3fir/7ohdYgsSA9+hUae31yHIy1g SN8slen3iJkUaK7MGbn2cyNjDFVBBja/2dJt4qkRnfMfB7ciZkiM83z6Ka19VNa4yAq2 jO4Lxn49ozTP43SnCF53Jb/pqTN609p+Mbm+3lX8/Ill/yj1TGXljOBQH5TPA2Ea9F0Y B68xwbwLCrmlCT0yZKMzrvz29wNKVDWIGoZxAyacBEiWhxqPKPa4I9CJf1tDDVXr+zhi qDfA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=RVsmivns; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id j36si8827641ota.58.2020.03.03.13.40.58; Tue, 03 Mar 2020 13:41:10 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel-dk.20150623.gappssmtp.com header.s=20150623 header.b=RVsmivns; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730418AbgCCTHm (ORCPT + 99 others); Tue, 3 Mar 2020 14:07:42 -0500 Received: from mail-il1-f195.google.com ([209.85.166.195]:33304 "EHLO mail-il1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730355AbgCCTHl (ORCPT ); Tue, 3 Mar 2020 14:07:41 -0500 Received: by mail-il1-f195.google.com with SMTP id r4so3796203iln.0 for ; Tue, 03 Mar 2020 11:07:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel-dk.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=zXPZ9A67qwmOg7igLgXKoR5AdobcYJckvv+S7pUuvQY=; b=RVsmivnszCvULa3sdd8ZiSiDR7on5w6WX87+33rE5AEVBNk6MRJYP3fph1RK0dq+3I PVYA/XQHyd3jA2zktgFe/M7Qbb9zXtDddA2swswtnomhrCjOq1d8+gEwvnEPYDJRJ8Hy 8JAFgDr6sr5vVhYetEB6vikYsVX0y6X8imD4xpPqEEwM385D86VL2YFIrkdROv3op8EI RqLYvEXKZ5EdLAqOCJMvPwKWvxqm+EHR8AN9xZ4V9RbQ5lXzgOPH/ZZupOsi6NUuxu6R R9EXa1SrOzS+QluyNBtjXUjtrkegqlVnnacCTudR/qKi8MBDZ6BVjtCMR61tFgIG1n3/ iXig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=zXPZ9A67qwmOg7igLgXKoR5AdobcYJckvv+S7pUuvQY=; b=dNL9+iuqfLcbJmHlm/+xFow+y3yM47m0wiOhYLYECibTlHgQI0NWGA+UUQezg7U3uJ O4ElmcWGtD8eT9+i7chASeFgk0MBoawEcnLYhAQtET1885zEu86JEWI4smxNWfNx1/4A uIFyCmxMQWjf/X/LG2cDhS9hZM49IG0pfswTZuajRmNwiZ0QTO1noacPXaaxQU/VPoaU RoEQILAtGNEwJjSs8Tm2k+LZt4MDwqknuLOmiFeovGhm9eQzwlel0mJy0GvXXsnZ9IDh zjwiCtvYIXww2bWc3dfAs1jQbYLxPKlthtmzyHVM4cDtmSCCU4el55qu8CIiB5JIfwR+ J6Qw== X-Gm-Message-State: ANhLgQ3ZxE+R6CSHuWnpud4gHP6JGxNN1qBPdMaHCl7ZM/RdSnR/wc3r oYElfGV0djq2Fu11Tywkv7sqCMreZ9o= X-Received: by 2002:a92:7e9d:: with SMTP id q29mr6105954ill.29.1583262460436; Tue, 03 Mar 2020 11:07:40 -0800 (PST) Received: from [192.168.1.159] ([65.144.74.34]) by smtp.gmail.com with ESMTPSA id j78sm5446799ili.37.2020.03.03.11.07.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 03 Mar 2020 11:07:39 -0800 (PST) Subject: Re: [PATCH 00/17] VFS: Filesystem information and notifications [ver #17] To: Jeff Layton , Greg Kroah-Hartman , Jann Horn Cc: Miklos Szeredi , Karel Zak , David Howells , Ian Kent , Christian Brauner , James Bottomley , Steven Whitehouse , Miklos Szeredi , viro , Christian Brauner , "Darrick J. Wong" , Linux API , linux-fsdevel , lkml References: <1509948.1583226773@warthog.procyon.org.uk> <20200303113814.rsqhljkch6tgorpu@ws.net.home> <20200303130347.GA2302029@kroah.com> <20200303131434.GA2373427@kroah.com> <20200303134316.GA2509660@kroah.com> <20200303141030.GA2811@kroah.com> <20200303142407.GA47158@kroah.com> <030888a2-db3e-919d-d8ef-79dcc10779f9@kernel.dk> <7a05adc8-1ca9-c900-7b24-305f1b3a9b86@kernel.dk> From: Jens Axboe Message-ID: <5394c5c4-aeb8-97d5-8347-e763a1abd9ed@kernel.dk> Date: Tue, 3 Mar 2020 12:07:38 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 3/3/20 12:02 PM, Jeff Layton wrote: > On Tue, 2020-03-03 at 09:55 -0700, Jens Axboe wrote: >> On 3/3/20 9:51 AM, Jeff Layton wrote: >>> On Tue, 2020-03-03 at 08:44 -0700, Jens Axboe wrote: >>>> On 3/3/20 7:24 AM, Greg Kroah-Hartman wrote: >>>>> On Tue, Mar 03, 2020 at 03:13:26PM +0100, Jann Horn wrote: >>>>>> On Tue, Mar 3, 2020 at 3:10 PM Greg Kroah-Hartman >>>>>> wrote: >>>>>>> On Tue, Mar 03, 2020 at 02:43:16PM +0100, Greg Kroah-Hartman wrote: >>>>>>>> On Tue, Mar 03, 2020 at 02:34:42PM +0100, Miklos Szeredi wrote: >>>>>>>>> On Tue, Mar 3, 2020 at 2:14 PM Greg Kroah-Hartman >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>>> Unlimited beers for a 21-line kernel patch? Sign me up! >>>>>>>>>>> >>>>>>>>>>> Totally untested, barely compiled patch below. >>>>>>>>>> >>>>>>>>>> Ok, that didn't even build, let me try this for real now... >>>>>>>>> >>>>>>>>> Some comments on the interface: >>>>>>>> >>>>>>>> Ok, hey, let's do this proper :) >>>>>>> >>>>>>> Alright, how about this patch. >>>>>>> >>>>>>> Actually tested with some simple sysfs files. >>>>>>> >>>>>>> If people don't strongly object, I'll add "real" tests to it, hook it up >>>>>>> to all arches, write a manpage, and all the fun fluff a new syscall >>>>>>> deserves and submit it "for real". >>>>>> >>>>>> Just FYI, io_uring is moving towards the same kind of thing... IIRC >>>>>> you can already use it to batch a bunch of open() calls, then batch a >>>>>> bunch of read() calls on all the new fds and close them at the same >>>>>> time. And I think they're planning to add support for doing >>>>>> open()+read()+close() all in one go, too, except that it's a bit >>>>>> complicated because passing forward the file descriptor in a generic >>>>>> way is a bit complicated. >>>>> >>>>> It is complicated, I wouldn't recommend using io_ring for reading a >>>>> bunch of procfs or sysfs files, that feels like a ton of overkill with >>>>> too much setup/teardown to make it worth while. >>>>> >>>>> But maybe not, will have to watch and see how it goes. >>>> >>>> It really isn't, and I too thinks it makes more sense than having a >>>> system call just for the explicit purpose of open/read/close. As Jann >>>> said, you can't currently do a linked sequence of open/read/close, >>>> because the fd passing between them isn't done. But that will come in >>>> the future. If the use case is "a bunch of files", then you could >>>> trivially do "open bunch", "read bunch", "close bunch" in three separate >>>> steps. >>>> >>>> Curious what the use case is for this that warrants a special system >>>> call? >>>> >>> >>> Agreed. I'd really rather see something more general-purpose than the >>> proposed readfile(). At least with NFS and SMB, you can compound >>> together fairly arbitrary sorts of operations, and it'd be nice to be >>> able to pattern calls into the kernel for those sorts of uses. >>> >>> So, NFSv4 has the concept of a current_stateid that is maintained by the >>> server. So basically you can do all this (e.g.) in a single compound: >>> >>> open >>> write >>> close >>> >>> It'd be nice to be able to do something similar with io_uring. Make it >>> so that when you do an open, you set the "current fd" inside the >>> kernel's context, and then be able to issue io_uring requests that >>> specify a magic "fd" value that use it. >>> >>> That would be a really useful pattern. >> >> For io_uring, you can link requests that you submit into a chain. Each >> link in the chain is done in sequence. Which means that you could do: >> >> >> >> in a single sequence. The only thing that is missing right now is a way >> to have the return of that open propagated to the 'fd' of the read and >> close, and it's actually one of the topics to discuss at LSFMM next >> month. >> >> One approach would be to use BPF to handle this passing, another >> suggestion has been to have the read/close specify some magic 'fd' value >> that just means "inherit fd from result of previous". The latter sounds >> very close to the stateid you mention above, and the upside here is that >> it wouldn't explode the necessary toolchain to need to include BPF. >> >> In other words, this is really close to being reality and practically >> feasible. >> > > Excellent. > > Yes, the latter is exactly what I had in mind for this. I suspect that > that would cover a large fraction of the potential use-cases for this. > > Basically, all you'd need to do is keep a pointer to struct file in the > internal state for the chain. Then, allow userland to specify some magic > fd value for subsequent chained operations that says to use that instead > of consulting the fdtable. Maybe use -4096 (-MAX_ERRNO - 1)? Yeah I think that'd be a suitable way to signal that. > That would cover the smb or nfs server sort of use cases, I think. For > the sysfs cases, I guess you'd need to dispatch several chains, but that > doesn't sound _too_ onerous. The magic fd would be per-chain, so doing multiple chains wouldn't really matter at all. Let me try and hack this up, should be pretty trivial. > In fact, with that you should even be able to emulate the proposed > readlink syscall in a userland library. Exactly -- Jens Axboe