Received: by 2002:ac0:a5a7:0:0:0:0:0 with SMTP id m36-v6csp993432imm; Wed, 15 Aug 2018 09:32:44 -0700 (PDT) X-Google-Smtp-Source: AA+uWPz9Qd7+wxXR5qKAXqRXslutbO7gJuMoaAQW7VIUbYFtdRQBrBWcQZ1xKmZ/OIPO1VVEZb8v X-Received: by 2002:a63:c50c:: with SMTP id f12-v6mr25137208pgd.88.1534350764355; Wed, 15 Aug 2018 09:32:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1534350764; cv=none; d=google.com; s=arc-20160816; b=IeuZqUM+6XlxM7Rn7/8MtMhezt4w8xw6bajbcdlc71n08H7Y2YvLUIApZm2m+ltlZf kXS3c0s0g7lgHokoo0P480915REwynelwPKRgtrzHlMbUi9tdCYX6xwGzqFXpD08irvi FjmmPSYxfX0pVTP7K1kq6PB2YIDsXmxBJ6pGnnYbAsJ3EFnjMlgHzn37gI/F9T9F9vjM ppj5e3ZJNYRvxfU5N/fS1KT4B74puSx1k+9uahFqSpCsT1KVSbIoF1t3qoI0MwZwxNkk FVk8T5oo6xmCiWtvDF1lxW95M0PeUb8ZhTQPfm2DYXKDeVgHSoR7NqxFRVY1/LN+uHbN 1k2Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:date:content-id:mime-version :subject:cc:to:references:in-reply-to:from:organization :arc-authentication-results; bh=dWaCsGeyJJBSA0M//oYLpTLy1N8pq9fDufHKQGHpwgQ=; b=rIhcyz1T/n60cQrR4lhGiofrh2sEhEZaaOZ0YcsNud2amCde27FsT3fhH/UbHCm0SG wgrVHB/92iiLPy3QB6rqutye26hqwBN+YiOQYBfIqHikByPUTjawho8hihXvpGAU8781 e19kbxTnrOMUJyXl64KyslI9NEVzj8re7bzHcVL7pMwXMH+SWPaTpE5rUTicJYLxXnY5 MjyAWgalf0ZnkVpl4LXlUiYqn8AvvBYQ5dE4M3ePL9ZS+0rUv6PJrPFkS7YqwUdUveSB gb7J1tbjWDcvNB1tfMtZmYM13RgrxrbhBjPjXpSEwrtogfIlAoRSgC6z915BzAeuEdcl 1lcg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a190-v6si23975058pgc.241.2018.08.15.09.32.27; Wed, 15 Aug 2018 09:32:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729990AbeHOTYX (ORCPT + 99 others); Wed, 15 Aug 2018 15:24:23 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:57528 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1729348AbeHOTYW (ORCPT ); Wed, 15 Aug 2018 15:24:22 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4EBB740216E8; Wed, 15 Aug 2018 16:31:31 +0000 (UTC) Received: from warthog.procyon.org.uk (ovpn-121-8.rdu2.redhat.com [10.10.121.8]) by smtp.corp.redhat.com (Postfix) with ESMTP id 53E4320389E0; Wed, 15 Aug 2018 16:31:26 +0000 (UTC) Organization: Red Hat UK Ltd. Registered Address: Red Hat UK Ltd, Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SI4 1TE, United Kingdom. Registered in England and Wales under Company Registration No. 3798903 From: David Howells In-Reply-To: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> References: <153313703562.13253.5766498657900728120.stgit@warthog.procyon.org.uk> To: trond.myklebust@hammerspace.com, anna.schumaker@netapp.com, sfrench@samba.org, steved@redhat.com, viro@zeniv.linux.org.uk Cc: dhowells@redhat.com, torvalds@linux-foundation.org, "Eric W. Biederman" , linux-api@vger.kernel.org, linux-security-module@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-cifs@vger.kernel.org, linux-afs@lists.infradead.org, ceph-devel@vger.kernel.org, v9fs-developer@lists.sourceforge.net Subject: Should we split the network filesystem setup into two phases? MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <17762.1534350685.1@warthog.procyon.org.uk> Date: Wed, 15 Aug 2018 17:31:25 +0100 Message-ID: <17763.1534350685@warthog.procyon.org.uk> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 15 Aug 2018 16:31:31 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Wed, 15 Aug 2018 16:31:31 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'dhowells@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Having just re-ported NFS on top of the new mount API stuff, I find that I don't really like the idea of superblocks being separated by communication parameters - especially when it might seem reasonable to be able to adjust those parameters. Does it make sense to abstract out the remote peer and allow (a) that to be configured separately from any superblocks using it and (b) that to be used to create superblocks? Note that what a 'remote peer' is would be different for different filesystems: (*) For NFS, it would probably be a named server, with address(es) attached to the name. In lieu of actually having a name, the initial IP address could be used. (*) For CIFS, it would probably be a named server. I'm not sure if CIFS allows an abstraction for a share that can move about inside a domain. (*) For AFS, it would be a cell, I think, where the actual fileserver(s) used are a matter of direction from the Volume Location server. (*) For 9P and Ceph, I don't really know. What could be configured? Well, addresses, ports, timeouts. Maybe protocol level negotiation - though not being able to explicitly specify, say, the particular version and minorversion on an NFS share would be problematic for backward compatibility. One advantage it could give us is that it might make it easier if someone asks for server X to query userspace in some way for the default parameters for X are. What might this look like in terms of userspace? Well, we could overload the new mount API: peer1 = fsopen("nfs", FSOPEN_CREATE_PEER); fsconfig(peer1, FSCONFIG_SET_NS, "net", NULL, netns_fd); fsconfig(peer1, FSCONFIG_SET_STRING, "peer_name", "server.home"); fsconfig(peer1, FSCONFIG_SET_STRING, "vers", "4.2"); fsconfig(peer1, FSCONFIG_SET_STRING, "address", "tcp:192.168.1.1"); fsconfig(peer1, FSCONFIG_SET_STRING, "address", "tcp:192.168.1.2"); fsconfig(peer1, FSCONFIG_SET_STRING, "timeo", "122"); fsconfig(peer1, FSCONFIG_CMD_SET_UP_PEER, NULL, NULL, 0); peer2 = fsopen("nfs", FSOPEN_CREATE_PEER); fsconfig(peer2, FSCONFIG_SET_NS, "net", NULL, netns_fd); fsconfig(peer2, FSCONFIG_SET_STRING, "peer_name", "server2.home"); fsconfig(peer2, FSCONFIG_SET_STRING, "vers", "3"); fsconfig(peer2, FSCONFIG_SET_STRING, "address", "tcp:192.168.1.3"); fsconfig(peer2, FSCONFIG_SET_STRING, "address", "udp:192.168.1.4+6001"); fsconfig(peer2, FSCONFIG_CMD_SET_UP_PEER, NULL, NULL, 0); fs = fsopen("nfs", 0); fsconfig(fs, FSCONFIG_SET_PEER, "peer.1", NULL, peer1); fsconfig(fs, FSCONFIG_SET_PEER, "peer.2", NULL, peer2); fsconfig(fs, FSCONFIG_SET_STRING, "source", "/home/dhowells", 0); m = fsmount(fs, 0, 0); [Note that Eric's oft-repeated point about the 'creation' operation altering established parameters still stands here.] You could also then reopen it for configuration, maybe by: peer = fspick(AT_FDCWD, "/mnt", FSPICK_PEER); or: peer = fspick(AT_FDCWD, "nfs:server.home", FSPICK_PEER_BY_NAME); though it might be better to give it its own syscall: peer = fspeer("nfs", "server.home", O_CLOEXEC); fsconfig(peer, FSCONFIG_SET_NS, "net", NULL, netns_fd); ... fsconfig(peer, FSCONFIG_CMD_SET_UP_PEER, NULL, NULL, 0); In terms of alternative interfaces, I'm not sure how easy it would be to make it like cgroups where you go and create a dir in a special filesystem, say, "/sys/peers/nfs", because the peers records and names would have to be network namespaced. Also, it might make it more difficult to use to create a root fs. On the other hand, being able to adjust the peer configuration by: echo 71 >/sys/peers/nfs/server.home/timeo does have a certain appeal. Also, netlink might be the right option, but I'm not sure how you'd pin the resultant object whilst you make use of it. A further thought is that is it worth making this idea more general and encompassing non-network devices also? This would run into issues of some logical sources being visible across namespaces and but not others. David