Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp1738463imm; Thu, 16 Aug 2018 01:33:36 -0700 (PDT) X-Google-Smtp-Source: AA+uWPws6MJK2RniLYUoPwHWJ6S0B8QH8LwJnIIWl8vnmjQW1Lg5eagLSCwp+RuKIHhzcDscffsq X-Received: by 2002:a63:951e:: with SMTP id p30-v6mr28019196pgd.318.1534408416106; Thu, 16 Aug 2018 01:33:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534408416; cv=none; d=google.com; s=arc-20160816; b=ZHZsR6Hw4+4UG73YhI63mN4jUQFSlEmxurFXld9aEqxN8QTi8Kx0qaDLNwidAilGLX TfOEfeAzLlnJmjZafFI2euTZ/5W5jSCQHUJ6VALY/Ln5P7NfXPDEizqpbBdtTOkWJl8+ GlG/1ze2Dv3CIIvD6pPF8crPExrzxUVZwY80L7RqjiHHP26cHxBtj1ruMLPnZc5x0Gg6 +YEwLbZBvLNmmtd/92iLzSDiOOQ+QIR2Rrc1JoyaP8Sj+2G+MqdJE2HSHvKcDF+gRSxC hmZpzXZB+VmQkQ7GoXsL73rEeN6xv5yQ5D7OkxIg09Papkgm39wfqJhd74A+YAbSCc48 D8/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:subject:mime-version:user-agent :message-id:in-reply-to:date:references:cc:to:from :arc-authentication-results; bh=Rm5x1yZHoqKBxU1JvtXpg8CwXN1ydfTtalI+r0ubUKI=; b=xNoisS1GJgVsHg3UrWV21mOTuAqexRCyZoLjTCkxyQwcfgHZ3dw0MZSaQWWYWSLhSp 3WHtj2D0F0rBKqJVGFpjbeISuhtR3qldbl5X5A+Hn7ioQ1Z2WPT+PRr4blI8+iAKOxFb Qa6S8Q4NJOsTUamrn7bcmYq4OenJvclU+bpSXIuAPKO+CLfWZlHpMA73KUnZbAPCY8kO wxe5ERFUoDjeZmKzojP4wxwg3XiS6R5P/m1CYA6BB0zOde8AlbIs1p+W+ZRHtB+bVQXg KmNWT2l1XfhIyWA1wkn3YMsw7beyu3Bl1jAOX6M69K2iT1eTxGtn4SjSOBOQcOA6XnOb xrGA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 3-v6si20641453plz.351.2018.08.16.01.33.18; Thu, 16 Aug 2018 01:33:36 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2388210AbeHPICH (ORCPT + 99 others); Thu, 16 Aug 2018 04:02:07 -0400 Received: from out01.mta.xmission.com ([166.70.13.231]:56131 "EHLO out01.mta.xmission.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726342AbeHPICH (ORCPT ); Thu, 16 Aug 2018 04:02:07 -0400 Received: from in01.mta.xmission.com ([166.70.13.51]) by out01.mta.xmission.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fqAU9-00057a-K7; Wed, 15 Aug 2018 23:06:13 -0600 Received: from [97.119.167.31] (helo=x220.xmission.com) by in01.mta.xmission.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.87) (envelope-from ) id 1fqAU8-00053H-7r; Wed, 15 Aug 2018 23:06:13 -0600 From: ebiederm@xmission.com (Eric W. Biederman) To: David Howells Cc: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com, sfrench@samba.org, steved@redhat.com, viro@zeniv.linux.org.uk, torvalds@linux-foundation.org, "Eric W. Biederman" , linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, linux-afs@lists.infradead.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net References: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> <17763.1534350685@warthog.procyon.org.uk> Date: Thu, 16 Aug 2018 00:06:06 -0500 In-Reply-To: <17763.1534350685@warthog.procyon.org.uk> (David Howells's message of "Wed, 15 Aug 2018 17:31:25 +0100") Message-ID: <87pnyiew8x.fsf@xmission.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-XM-SPF: eid=1fqAU8-00053H-7r;;;mid=<87pnyiew8x.fsf@xmission.com>;;;hst=in01.mta.xmission.com;;;ip=97.119.167.31;;;frm=ebiederm@xmission.com;;;spf=neutral X-XM-AID: U2FsdGVkX1+zH+uRQ/4nj6EqW67JAQvHVTeYFxwggYo= X-SA-Exim-Connect-IP: 97.119.167.31 X-SA-Exim-Mail-From: ebiederm@xmission.com X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on sa06.xmission.com X-Spam-Level: X-Spam-Status: No, score=0.5 required=8.0 tests=ALL_TRUSTED,BAYES_50, DCC_CHECK_NEGATIVE,T_TM2_M_HEADER_IN_MSG,XMSubLong autolearn=disabled version=3.4.1 X-Spam-Report: * -1.0 ALL_TRUSTED Passed through trusted hosts only via SMTP * 0.7 XMSubLong Long Subject * 0.0 T_TM2_M_HEADER_IN_MSG BODY: No description available. * 0.8 BAYES_50 BODY: Bayes spam probability is 40 to 60% * [score: 0.4998] * -0.0 DCC_CHECK_NEGATIVE Not listed in DCC * [sa06 1397; Body=1 Fuz1=1 Fuz2=1] X-Spam-DCC: XMission; sa06 1397; Body=1 Fuz1=1 Fuz2=1 X-Spam-Combo: ;David Howells X-Spam-Relay-Country: X-Spam-Timing: total 435 ms - load_scoreonly_sql: 0.04 (0.0%), signal_user_changed: 3.3 (0.8%), b_tie_ro: 2.2 (0.5%), parse: 1.21 (0.3%), extract_message_metadata: 7 (1.6%), get_uri_detail_list: 4.7 (1.1%), tests_pri_-1000: 3.8 (0.9%), tests_pri_-950: 1.22 (0.3%), tests_pri_-900: 1.11 (0.3%), tests_pri_-400: 35 (8.0%), check_bayes: 34 (7.7%), b_tokenize: 12 (2.8%), b_tok_get_all: 11 (2.4%), b_comp_prob: 4.3 (1.0%), b_tok_touch_all: 4.2 (1.0%), b_finish: 0.70 (0.2%), tests_pri_0: 369 (84.8%), check_dkim_signature: 0.56 (0.1%), check_dkim_adsp: 2.6 (0.6%), tests_pri_500: 5 (1.2%), rewrite_mail: 0.00 (0.0%) Subject: Re: Should we split the network filesystem setup into two phases? X-Spam-Flag: No X-SA-Exim-Version: 4.2.1 (built Thu, 05 May 2016 13:38:54 -0600) X-SA-Exim-Scanned: Yes (on in01.mta.xmission.com) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org David Howells writes: > Having just re-ported NFS on top of the new mount API stuff, I find that I > don't really like the idea of superblocks being separated by communication > parameters - especially when it might seem reasonable to be able to adjust > those parameters. > > Does it make sense to abstract out the remote peer and allow (a) that to be > configured separately from any superblocks using it and (b) that to be used to > create superblocks? > > Note that what a 'remote peer' is would be different for different > filesystems: > > (*) For NFS, it would probably be a named server, with address(es) attached > to the name. In lieu of actually having a name, the initial IP address > could be used. > > (*) For CIFS, it would probably be a named server. I'm not sure if CIFS > allows an abstraction for a share that can move about inside a domain. > > (*) For AFS, it would be a cell, I think, where the actual fileserver(s) used > are a matter of direction from the Volume Location server. > > (*) For 9P and Ceph, I don't really know. > > What could be configured? Well, addresses, ports, timeouts. Maybe protocol > level negotiation - though not being able to explicitly specify, say, the > particular version and minorversion on an NFS share would be problematic for > backward compatibility. > > One advantage it could give us is that it might make it easier if someone asks > for server X to query userspace in some way for the default parameters for X > are. > > What might this look like in terms of userspace? Well, we could overload the > new mount API: > > peer1 = fsopen("nfs", FSOPEN_CREATE_PEER); > fsconfig(peer1, FSCONFIG_SET_NS, "net", NULL, netns_fd); > fsconfig(peer1, FSCONFIG_SET_STRING, "peer_name", "server.home"); > fsconfig(peer1, FSCONFIG_SET_STRING, "vers", "4.2"); > fsconfig(peer1, FSCONFIG_SET_STRING, "address", "tcp:192.168.1.1"); > fsconfig(peer1, FSCONFIG_SET_STRING, "address", "tcp:192.168.1.2"); > fsconfig(peer1, FSCONFIG_SET_STRING, "timeo", "122"); > fsconfig(peer1, FSCONFIG_CMD_SET_UP_PEER, NULL, NULL, 0); > > peer2 = fsopen("nfs", FSOPEN_CREATE_PEER); > fsconfig(peer2, FSCONFIG_SET_NS, "net", NULL, netns_fd); > fsconfig(peer2, FSCONFIG_SET_STRING, "peer_name", "server2.home"); > fsconfig(peer2, FSCONFIG_SET_STRING, "vers", "3"); > fsconfig(peer2, FSCONFIG_SET_STRING, "address", "tcp:192.168.1.3"); > fsconfig(peer2, FSCONFIG_SET_STRING, "address", "udp:192.168.1.4+6001"); > fsconfig(peer2, FSCONFIG_CMD_SET_UP_PEER, NULL, NULL, 0); > > fs = fsopen("nfs", 0); > fsconfig(fs, FSCONFIG_SET_PEER, "peer.1", NULL, peer1); > fsconfig(fs, FSCONFIG_SET_PEER, "peer.2", NULL, peer2); > fsconfig(fs, FSCONFIG_SET_STRING, "source", "/home/dhowells", 0); > m = fsmount(fs, 0, 0); > > [Note that Eric's oft-repeated point about the 'creation' operation altering > established parameters still stands here.] > > You could also then reopen it for configuration, maybe by: > > peer = fspick(AT_FDCWD, "/mnt", FSPICK_PEER); > > or: > > peer = fspick(AT_FDCWD, "nfs:server.home", FSPICK_PEER_BY_NAME); > > though it might be better to give it its own syscall: > > peer = fspeer("nfs", "server.home", O_CLOEXEC); > fsconfig(peer, FSCONFIG_SET_NS, "net", NULL, netns_fd); > ... > fsconfig(peer, FSCONFIG_CMD_SET_UP_PEER, NULL, NULL, 0); > > In terms of alternative interfaces, I'm not sure how easy it would be to make > it like cgroups where you go and create a dir in a special filesystem, say, > "/sys/peers/nfs", because the peers records and names would have to be network > namespaced. Also, it might make it more difficult to use to create a root fs. > > On the other hand, being able to adjust the peer configuration by: > > echo 71 >/sys/peers/nfs/server.home/timeo > > does have a certain appeal. > > Also, netlink might be the right option, but I'm not sure how you'd pin the > resultant object whilst you make use of it. > > A further thought is that is it worth making this idea more general and > encompassing non-network devices also? This would run into issues of some > logical sources being visible across namespaces and but not others. Even network filesystems are going to have challenges of filesystems being visible in some network namespaces and not others. As some filesystems will be visible on the internet and some filesystems will only be visible on the appropriate local network. Network namespaces are sometimes used to deal with the case of local networks with overlapping ip addresses. I think you are proposing a model for network filesystems that is essentially the same situation where we are with most block devices filesystems today. Where some parameters identitify the local filesystem instance and some parameters identify how the kernel interacts with that filesystem instance. For system efficiency there is a strong argument for having the fewest number of filesystem instances we can. Otherwise we will be caching the same data twice and wasting space in RAM etc. So I like the idea. At least for devpts we always create a new filesystem instance every time mount(2) is called. NFS seems to have the option to create a new filesystem instance every time mount(2) is called as well, (even if the filesystem parameters are the same). And depending on the case I can see the attraction for other filesystems as well. So I don't think we can completely abandon the option for filesystems to always create a new filesystem instance when mount(8) is called. I most definitely support thinking this through and figuring out how it best make sense for the new filesystem API to create new filesystem instances or fail to create new filesystems instances. Eric