Return-Path: Received: from aserp1040.oracle.com ([141.146.126.69]:29357 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757146AbcBRV3y convert rfc822-to-8bit (ORCPT ); Thu, 18 Feb 2016 16:29:54 -0500 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.2 \(3112\)) Subject: Re: "Re: [PATCH RFC Version 1 0/6] Request for Comment: NFS4.1 Session Trunking" From: Chuck Lever In-Reply-To: <20160218203915.GA7771@fieldses.org> Date: Thu, 18 Feb 2016 16:29:46 -0500 Cc: "Adamson, Andy" , Trond Myklebust , Martin Houry , Linux NFS Mailing List Message-Id: <336DFCFA-5655-4CB1-82F2-9E3B17030094@oracle.com> References: <20160217205929.GF10401@fieldses.org> <3B48A59F-638A-45C9-B2E4-2D65C00DE639@netapp.com> <20160218141447.GB4256@fieldses.org> <839836DE-11FD-4310-A76E-630548C0777B@netapp.com> <20160218203915.GA7771@fieldses.org> To: "J. Bruce Fields" Sender: linux-nfs-owner@vger.kernel.org List-ID: > On Feb 18, 2016, at 3:39 PM, J. Bruce Fields wrote: > > On Thu, Feb 18, 2016 at 07:41:19PM +0000, Adamson, Andy wrote: >> >>> On Feb 18, 2016, at 1:32 PM, Trond Myklebust wrote: >>> >>> On Thu, Feb 18, 2016 at 9:14 AM, J. Bruce Fields wrote: >>>> >>>> On Wed, Feb 17, 2016 at 06:55:43PM -0500, Trond Myklebust wrote: >>>>> On Wed, Feb 17, 2016 at 5:52 PM, Chuck Lever wrote: >>>>>> >>>>>>> On Feb 17, 2016, at 5:35 PM, Adamson, Andy wrote: >>>>>>> The fs_locations would need to be requested by the client. I guess we reqest them at every mountâ€Ķ. >>>>>> >>>>>> Yep, and fetch them again every so often. There's no real >>>>>> cache coherency protocol for this information. (That's >>>>>> where a pNFS layout might be more valuable). >>>>> >>>>> If your goal is to do session trunking, you only really need to check >>>>> the fs_locations attribute on the root file system. (so >>>>> GETROOTFH+GETATTR(fs_locations)). That's the natural place for a >>>>> server to advertise its full set of IP addresses, and the session >>>>> trunking protocol itself will allow you to winnow out any that might >>>>> belong to a replica server. >>>> >>>> I worry that round-robin could behave really badly if the client's path >>>> to the two IP addresses have different performance characteristics. But >>>> a server should probably still be allowed to advertise those as replicas >>>> (e.g. maybe a slower interface is usable as a fallback?). >>>> >>>> So maybe we should be careful about making this automatic. Unless the >>>> load-balancing is a little smarter than pure round robin. Or unless we >>>> can get some more fine-grained information (maybe someone could use >>>> fs_location_info's preference information for this?). >>> >>> The multipath policy is pluggable. If you need something more clever >>> than round robin, then feel free to play. However do note that for >>> pNFS multipathing, both the files and flexfiles specs are clear that >>> you should not mix slow and fast transports. I imagine you probably >>> want to do the same for fs_locations. Right, and barring the ability to mix TCP and RDMA transports, I think there's no reliable way either the client or the server can tell that any particular path has compromised performance. It's going to have to be up to administrators to make sure this is configured correctly, at least for now. >>> As for fs_locations_info, please see FSLI4BX_(READ|WRITE)(RANK|ORDER). >> >> OK. I’m testing session trunking using new multiple hostname mount options. I’ll submit another RFC patchset. >> Then, caveat patchset response, I’ll switch from the multiple hostname mount options to fs_locations_info > > You mean you want to remove support for the commandline list of > hostnames at that point? > > I'd rather keep support for listing them on the commandline. I think > the fs_locations_info is a little more complicated than I did at first > look. (Among other things, it requires server support, and some thought > about how exactly to interpret that fs_locations_info preference > information.) True; there's a reason I never got to implementing fs_locations_info on the Linux server for FedFS. There are sticky problems around the mountd upcall that is used to communicate this information to the kernel, for example. However, I don't agree that this is a good reason to go with multiple hostnames on the mount command line. I like Andy's plan to keep this CLI change out of the long term upstream code, but continue to use it for testing. -- Chuck Lever