Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:43934 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932314AbdERPIy (ORCPT ); Thu, 18 May 2017 11:08:54 -0400 Date: Thu, 18 May 2017 11:08:50 -0400 From: "J. Bruce Fields" To: Trond Myklebust Cc: "stefanha@redhat.com" , "chuck.lever@oracle.com" , "bfields@fieldses.org" , "SteveD@redhat.com" , "linux-nfs@vger.kernel.org" Subject: Re: EXCHANGE_ID with same network address but different server owner Message-ID: <20170518150850.GB16256@parsley.fieldses.org> References: <20170512132721.GA654@stefanha-x1.localdomain> <20170512143410.GC17983@parsley.fieldses.org> <1494601295.10434.1.camel@primarydata.com> <021509A5-FA89-4289-B190-26DC317A09F6@oracle.com> <20170515144306.GB16013@stefanha-x1.localdomain> <20170515160248.GD9697@parsley.fieldses.org> <20170516131142.GA12711@fieldses.org> <20170518133441.GC4155@stefanha-x1.localdomain> <1495119887.11859.1.camel@primarydata.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 In-Reply-To: <1495119887.11859.1.camel@primarydata.com> Sender: linux-nfs-owner@vger.kernel.org List-ID: On Thu, May 18, 2017 at 03:04:50PM +0000, Trond Myklebust wrote: > On Thu, 2017-05-18 at 10:28 -0400, Chuck Lever wrote: > > > On May 18, 2017, at 9:34 AM, Stefan Hajnoczi > > > wrote: > > > > > > On Tue, May 16, 2017 at 09:11:42AM -0400, J. Bruce Fields wrote: > > > > I think you explained this before, perhaps you could just offer a > > > > pointer: remind us what your requirements or use cases are > > > > especially > > > > for VM migration? > > > > > > The NFS over AF_VSOCK configuration is: > > > > > > A guest running on host mounts an NFS export from the host.??The > > > NFS > > > server may be kernel nfsd or an NFS frontend to a distributed > > > storage > > > system like Ceph.??A little more about these cases below. > > > > > > Kernel nfsd is useful for sharing files.??For example, the guest > > > may > > > read some files from the host when it launches and/or it may write > > > out > > > result files to the host when it shuts down.??The user may also > > > wish to > > > share their home directory between the guest and the host. > > > > > > NFS frontends are a different use case.??They hide distributed > > > storage > > > systems from guests in cloud environments.??This way guests don't > > > see > > > the details of the Ceph, Gluster, etc nodes.??Besides benefiting > > > security it also allows NFS-capable guests to run without > > > installing > > > specific drivers for the distributed storage system.??This use case > > > is > > > "filesystem as a service". > > > > > > The reason for using AF_VSOCK instead of TCP/IP is that traditional > > > networking configuration is fragile.??Automatically adding a > > > dedicated > > > NIC to the guest and choosing an IP subnet has a high chance of > > > conflicts (subnet collisions, network interface naming, firewall > > > rules, > > > network management tools).??AF_VSOCK is a zero-configuration > > > communications channel so it avoids these problems. > > > > > > On to migration.??For the most part, guests can be live migrated > > > between > > > hosts without significant downtime or manual steps.??PCI > > > passthrough is > > > an example of a feature that makes it very hard to live migrate.??I > > > hope > > > we can allow migration with NFS, although some limitations may be > > > necessary to make it feasible. > > > > > > There are two NFS over AF_VSOCK migration scenarios: > > > > > > 1. The files live on host H1 and host H2 cannot access the files > > > ? directly.??There is no way for an NFS server on H2 to access > > > those > > > ? same files unless the directory is copied along with the guest or > > > H2 > > > ? proxies to the NFS server on H1. > > > > Having managed (and shared) storage on the physical host is > > awkward. I know some cloud providers might do this today by > > copying guest disk images down to the host's local disk, but > > generally it's not a flexible primary deployment choice. > > > > There's no good way to expand or replicate this pool of > > storage. A backup scheme would need to access all physical > > hosts. And the files are visible only on specific hosts. > > > > IMO you want to treat local storage on each physical host as > > a cache tier rather than as a back-end tier. > > > > > > > 2. The files are accessible from both host H1 and host H2 because > > > they > > > ? are on shared storage or distributed storage system.??Here the > > > ? problem is "just" migrating the state from H1's NFS server to H2 > > > so > > > ? that file handles remain valid. > > > > Essentially this is the re-export case, and this makes a lot > > more sense to me from a storage administration point of view. > > > > The pool of administered storage is not local to the physical > > hosts running the guests, which is how I think cloud providers > > would prefer to operate. > > > > User storage would be accessible via an NFS share, but managed > > in a Ceph object (with redundancy, a common high throughput > > backup facility, and secure central management of user > > identities). > > > > Each host's NFS server could be configured to expose only the > > the cloud storage resources for the tenants on that host. The > > back-end storage (ie, Ceph) could operate on a private storage > > area network for better security. > > > > The only missing piece here is support in Linux-based NFS > > servers for transparent state migration. > > Not really. In a containerised world, we're going to see more and more > cases where just a single process/application gets migrated from one > NFS client to another (and yes, a re-exporter/proxy of NFS is just > another client as far as the original server is concerned). > IOW: I think we want to allow a client to migrate some parts of its > lock state to another client, without necessarily requiring every > process being migrated to have its own clientid. It wouldn't have to be every process, it'd be every container, right? What's the disadvantage of per-container clientids? I guess you lose the chance to share delegations and caches. --b.