Hi Trond, Anna,
We currently have several field installations containing NFS and
SunRPC-related patches that greatly improve performance of NFSv3 clients
over RDMA setups, where link aggregation is not supported.
I would like work to integrate several of these changes to upstream, and
discuss their implementation. We managed to get a bandwidth of 33 GB/sec
from single node NFSv3 mount, and later around 92 GB/sec from a single
mount using further enhancements in RPC request dispatch.
The main change allows specifying multiple target IP addresses in a
single mount, that combined with nconnect and multiple floating IPs,
provides load balancing over several target nodes. This is good for
systems where load balancing is managed by moving a group of floating IP
addresses. This works especially well on RoCE setups.
The networking setup on these clients comprises of multiple RDMA network
interfaces that are connected to the same network, and each has its own
IP address.
The proposed change specifies a new `remoteports=<IP-addresses-ranges>`
mount option providing a group of IP addresses, from which `nconnect` at
sunrpc scope picks target transport address in round-robin. There's also
an accompanying `localports` parameter that allows local address bind so
that the source port is better controlled, in a way to ensure that
transports are not hogging a single local interface. So essentially,
this is a form of session trunking, that can be thought as an extension
to the existing `nconnect` parameter.
To my understanding NFSv4.x with pNFS has advanced dynamic transport
management logic along file layouts supporting stripe over file offsets,
however there are cases in which we would like to achieve good
performance even with the older protocol.
Before I adjust the patches I'm testing for v5.11, do you see other
implementation or user interface considerations I should take into
account?
Thanks
--
Dan Aloni
> On Jan 12, 2021, at 9:17 AM, Dan Aloni <[email protected]> wrote:
>
> Hi Trond, Anna,
>
> We currently have several field installations containing NFS and
> SunRPC-related patches that greatly improve performance of NFSv3 clients
> over RDMA setups, where link aggregation is not supported.
>
> I would like work to integrate several of these changes to upstream, and
> discuss their implementation. We managed to get a bandwidth of 33 GB/sec
> from single node NFSv3 mount, and later around 92 GB/sec from a single
> mount using further enhancements in RPC request dispatch.
>
> The main change allows specifying multiple target IP addresses in a
> single mount, that combined with nconnect and multiple floating IPs,
> provides load balancing over several target nodes. This is good for
> systems where load balancing is managed by moving a group of floating IP
> addresses. This works especially well on RoCE setups.
>
> The networking setup on these clients comprises of multiple RDMA network
> interfaces that are connected to the same network, and each has its own
> IP address.
>
> The proposed change specifies a new `remoteports=<IP-addresses-ranges>`
> mount option providing a group of IP addresses, from which `nconnect` at
> sunrpc scope picks target transport address in round-robin. There's also
> an accompanying `localports` parameter that allows local address bind so
> that the source port is better controlled, in a way to ensure that
> transports are not hogging a single local interface. So essentially,
> this is a form of session trunking, that can be thought as an extension
> to the existing `nconnect` parameter.
>
> To my understanding NFSv4.x with pNFS has advanced dynamic transport
> management logic along file layouts supporting stripe over file offsets,
> however there are cases in which we would like to achieve good
> performance even with the older protocol.
Hi Dan, my curiosity is piqued about the RPC request dispatch changes
you have in mind. Can you post them here for review?
Also, if you can tell us, what NFS server supports NFS/RDMA but not
NFSv4 ?
> Before I adjust the patches I'm testing for v5.11, do you see other
> implementation or user interface considerations I should take into
> account?
>
> Thanks
>
> --
> Dan Aloni
--
Chuck Lever
On Wed, Jan 13, 2021 at 09:59:58AM -0500, Chuck Lever wrote:
> > To my understanding NFSv4.x with pNFS has advanced dynamic transport
> > management logic along file layouts supporting stripe over file offsets,
> > however there are cases in which we would like to achieve good
> > performance even with the older protocol.
>
> Hi Dan, my curiosity is piqued about the RPC request dispatch changes
> you have in mind. Can you post them here for review?
These changes depend on the initial changes I'd like to contribute. The
gist of them concerns the xprt multipath algorithm where in addition to
round robin, we add further considerations regarding transport picking.
For example, for data IOs, the NUMA node to which the memory pages
attached to the RPC request may be used to pick a transport with an
outgoing local port that is closer to that memory in the server
architecture compared to the other local ports. So the idea is to lessen
data transfer bottlenecks in hardware.
> Also, if you can tell us, what NFS server supports NFS/RDMA but not
> NFSv4 ?
For example, there are VAST Data clusters currently supporting NFSv3 (as
one logical server), with the NFSv4 support coming soon.
--
Dan Aloni