Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp342992pxf; Wed, 24 Mar 2021 06:32:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyHCgaUYlJWIw7a3/4BwJe6tgbR+akSPdnZgpjJ0U6Yl7/MLPgnhiAB1/S9EdIw6ykznn/7 X-Received: by 2002:a17:907:76b6:: with SMTP id jw22mr3687367ejc.11.1616592741939; Wed, 24 Mar 2021 06:32:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616592741; cv=none; d=google.com; s=arc-20160816; b=uOfxZQEFghOyUcEjiQSBk5LFznOaI2tH6yiKglxmVaGVhUVzJH6HFbCUacXQ3U90qN kKgkVwNJydere17QmbOCCS/LflsYgf177xnvKcDR6e0VGqRHHO8WVhM2yaUChVzvCaHu 2a/vxZyBMd+XYLDdhgkfZaMzmFR+6Q6SOXbYvPAQygU9icTDKgNl/PpsRkSU/LyqpQiN PLxMkgxK+DGeRleUID3EEEveKVul7u+vbR5rc0dpN3OlwK4I+E140kr7Nsh4QrGt7Kcu UkOeF20fOgvHZp8bkYuC5izUR+xyhevJLQEUel9jMBKHQPzw+tfmvXp5hyMmDuPzN1Vt oCCA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=JuYSYzx/nGJFeOu6gP/qsADa4h6XawwSeUCimAUctrA=; b=cI7fx1eOxzUXR/Ae3uiyySLHDvq2mcduFjkbmbr1K8QS8CBPQJ/+ksfqRAr/WZXOjW wMMGgnTcTXTUQFti8FGItHc0B0qhwPJEX0p7mu9zdiVGAxQIILeF6rLjoxvyGjULK7P9 3HwVI1h7WbzNfO+yOwzpbjyCavXKsc3G5s1hB1YNImSPSYolcvHmFJ35OXqYePx6zgiZ uco3VSGSZpXN5TjyuhL0/10nYXPXWTvbpMzyG5uWHC4xJwd9dZTiSwLsmbtqQjM5f+Tc Zv9pZSH4sm9HDiXs13TzNfmBs28Cr5LlPh8BlMMfCmsI71Khdm0C8XwVVipPyo332oE7 rv/Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n23si1786951ejs.705.2021.03.24.06.31.45; Wed, 24 Mar 2021 06:32:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-nfs-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235296AbhCXNbB (ORCPT + 99 others); Wed, 24 Mar 2021 09:31:01 -0400 Received: from p3plsmtpa11-03.prod.phx3.secureserver.net ([68.178.252.104]:35187 "EHLO p3plsmtpa11-03.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235262AbhCXNav (ORCPT ); Wed, 24 Mar 2021 09:30:51 -0400 X-Greylist: delayed 439 seconds by postgrey-1.27 at vger.kernel.org; Wed, 24 Mar 2021 09:30:50 EDT Received: from [192.168.0.116] ([71.184.94.153]) by :SMTPAUTH: with ESMTPSA id P3Tpl0vNfKEOAP3TqlxCqi; Wed, 24 Mar 2021 06:23:26 -0700 X-CMAE-Analysis: v=2.4 cv=erwacqlX c=1 sm=1 tr=0 ts=605b3d4e a=vbvdVb1zh1xTTaY8rfQfKQ==:117 a=vbvdVb1zh1xTTaY8rfQfKQ==:17 a=IkcTkHD0fZMA:10 a=yMhMjlubAAAA:8 a=odEeS2lDZInev2U_FtgA:9 a=7Zwj6sZBwVKJAoWSPKxL6X1jA+E=:19 a=QEXdDO2ut3YA:10 a=fCgQI5UlmZDRPDxm0A3o:22 X-SECURESERVER-ACCT: tom@talpey.com Subject: Re: [PATCH 0/5] nfs: Add mount option for forcing RPC requests for one file over one connection To: Chuck Lever III , Nagendra Tomar Cc: Linux NFS Mailing List , Trond Myklebust , Anna Schumaker References: <5B030422-09B7-470D-9C7A-18C666F5817D@oracle.com> From: Tom Talpey Message-ID: <8ad09054-967e-d58b-1bba-c63aa5362f6f@talpey.com> Date: Wed, 24 Mar 2021 09:23:25 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.1 MIME-Version: 1.0 In-Reply-To: <5B030422-09B7-470D-9C7A-18C666F5817D@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4xfONyZtFcJrVXm7c34BrJ/6uI+bDIFqJGgLK+O7jaoSn5q5q93+s42e/ibzh8dyrYCY1nsi+jg5wAOcJjsiR5W0D2nE+JHUEx4N4hK6GO3tgYAjxy+Ncz GzX/TEMT1wbjeCZh4ziuxSiBI8982oJ5VACaCqS5IftMZfXV8+ZENFLdcQa/wwOGChdNMMDKOWz+RSqv4d2avMVKlx3ZitAvQivzgN7eNUqvrZhvqcalxb2S ibySteQzQZCg/cnaO7Anpa5yIHMTSTbwuF7UJEjFZV+RYTv/XnPZ+u3e2NSVgfrL0HlmRJfuvoAM3eOuvio9yBks6Mczyf/hwjAGqoAhSWkxcwIlsPbhH/oE Wkg9wk0a Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On 3/23/2021 12:14 PM, Chuck Lever III wrote: > > >> On Mar 23, 2021, at 11:57 AM, Nagendra Tomar wrote: >> >>>> On Mar 23, 2021, at 1:46 AM, Nagendra Tomar >>> wrote: >>>> >>>> From: Nagendra S Tomar >>>> >>>> If a clustered NFS server is behind an L4 loadbalancer the default >>>> nconnect roundrobin policy may cause RPC requests to a file to be >>>> sent to different cluster nodes. This is because the source port >>>> would be different for all the nconnect connections. >>>> While this should functionally work (since the cluster will usually >>>> have a consistent view irrespective of which node is serving the >>>> request), it may not be desirable from performance pov. As an >>>> example we have an NFSv3 frontend to our Object store, where every >>>> NFSv3 file is an object. Now if writes to the same file are sent >>>> roundrobin to different cluster nodes, the writes become very >>>> inefficient due to the consistency requirement for object update >>>> being done from different nodes. >>>> Similarly each node may maintain some kind of cache to serve the file >>>> data/metadata requests faster and even in that case it helps to have >>>> a xprt affinity for a file/dir. >>>> In general we have seen such scheme to scale very well. >>>> >>>> This patch introduces a new rpc_xprt_iter_ops for using an additional >>>> u32 (filehandle hash) to affine RPCs to the same file to one xprt. >>>> It adds a new mount option "ncpolicy=roundrobin|hash" which can be >>>> used to select the nconnect multipath policy for a given mount and >>>> pass the selected policy to the RPC client. >>> >>> This sets off my "not another administrative knob that has >>> to be tested and maintained, and can be abused" allergy. >>> >>> Also, my "because connections are shared by mounts of the same >>> server, all those mounts will all adopt this behavior" rhinitis. >> >> Yes, it's fair to call this out, but ncpolicy behaves like the nconnect >> parameter in this regards. >> >>> And my "why add a new feature to a legacy NFS version" hives. >>> >>> >>> I agree that your scenario can and should be addressed somehow. >>> I'd really rather see this done with pNFS. >>> >>> Since you are proposing patches against the upstream NFS client, >>> I presume all your clients /can/ support NFSv4.1+. It's the NFS >>> servers that are stuck on NFSv3, correct? >> >> Yes. >> >>> >>> The flexfiles layout can handle an NFSv4.1 client and NFSv3 data >>> servers. In fact it was designed for exactly this kind of mix of >>> NFS versions. >>> >>> No client code change will be necessary -- there are a lot more >>> clients than servers. The MDS can be made to work smartly in >>> concert with the load balancer, over time; or it can adopt other >>> clever strategies. >>> >>> IMHO pNFS is the better long-term strategy here. >> >> The fundamental difference here is that the clustered NFSv3 server >> is available over a single virtual IP, so IIUC even if we were to use >> NFSv41 with flexfiles layout, all it can handover to the client is that single >> (load-balanced) virtual IP and now when the clients do connect to the >> NFSv3 DS we still have the same issue. Am I understanding you right? >> Can you pls elaborate what you mean by "MDS can be made to work >> smartly in concert with the load balancer"? > > I had thought there were multiple NFSv3 server targets in play. > > If the load balancer is making them look like a single IP address, > then take it out of the equation: expose all the NFSv3 servers to > the clients and let the MDS direct operations to each data server. > > AIUI this is the approach (without the use of NFSv3) taken by > NetApp next generation clusters. It certainly sounds like the load balancer is actually performing a storage router function here, and roundrobin is going to thrash that badly. I'm not sure that exposing a magic "hash" knob is a very good solution though. Pushing decisions to the sysadmin is rarely a great approach. Why not simply argue that "hash" is the better algorithm, and prove that it be the default? Is that not the case? Tom. >>>> It adds a new rpc_procinfo member p_fhhash, which can be supplied >>>> by the specific RPC programs to return a u32 hash of the file/dir the >>>> RPC is targetting, and lastly it provides p_fhhash implementation >>>> for various NFS v3/v4/v41/v42 RPCs to generate the hash correctly. >>>> >>>> Thoughts? >>>> >>>> Thanks, >>>> Tomar >>>> >>>> Nagendra S Tomar (5): >>>> SUNRPC: Add a new multipath xprt policy for xprt selection based >>>> on target filehandle hash >>>> SUNRPC/NFSv3/NFSv4: Introduce "enum ncpolicy" to represent the >>> nconnect >>>> policy and pass it down from mount option to rpc layer >>>> SUNRPC/NFSv4: Rename RPC_TASK_NO_ROUND_ROBIN -> >>> RPC_TASK_USE_MAIN_XPRT >>>> NFSv3: Add hash computation methods for NFSv3 RPCs >>>> NFSv4: Add hash computation methods for NFSv4/NFSv42 RPCs >>>> >>>> fs/nfs/client.c | 3 + >>>> fs/nfs/fs_context.c | 26 ++ >>>> fs/nfs/internal.h | 2 + >>>> fs/nfs/nfs3client.c | 4 +- >>>> fs/nfs/nfs3xdr.c | 154 +++++++++++ >>>> fs/nfs/nfs42xdr.c | 112 ++++++++ >>>> fs/nfs/nfs4client.c | 14 +- >>>> fs/nfs/nfs4proc.c | 18 +- >>>> fs/nfs/nfs4xdr.c | 516 ++++++++++++++++++++++++++++++----- >>>> fs/nfs/super.c | 7 +- >>>> include/linux/nfs_fs_sb.h | 1 + >>>> include/linux/sunrpc/clnt.h | 15 + >>>> include/linux/sunrpc/sched.h | 2 +- >>>> include/linux/sunrpc/xprtmultipath.h | 9 +- >>>> include/trace/events/sunrpc.h | 4 +- >>>> net/sunrpc/clnt.c | 38 ++- >>>> net/sunrpc/xprtmultipath.c | 91 +++++- >>>> 17 files changed, 913 insertions(+), 103 deletions(-) >>> >>> -- >>> Chuck Lever > > -- > Chuck Lever > > > >