2021-04-20 20:59:53

by guy keren

[permalink] [raw]
Subject: Linux NFS4.1 client's "server trunking" seems to do the opposite of what the name implies


Hi,

when attempting to make two NFS 4.1 mounts from a linux NFS client, to
two IP addresses belonging to two different hosts in the same cluster
(i.e. the server major id in the EXCHANGE_ID response is the same) - the
linux NFS4.1 client discards the new TCP connection (to the 2nd IP) and
instead decides to use the first client connection for both mounts. this
seems to be handled in a hard-coded inside the function named
"nfs41_discover_server_trunking", and leads to reduced performance,
relative to using NFS3 (which will use two different TCP connections to
the two different hosts in the storage cluster).

i was under the impression that (client_id) trunking is supposed to
allow to multiplex commands over multiple TCP connections - not to
consolidate different workloads onto the same TCP connection.

is there a way to avoid this behaviour, other then faking that the
"server major id" is different on each node in the cluster? (this is
what appears to be done by NetApp, for instance).

thanks,

--guy keren

Vast Data.


2021-04-21 00:41:50

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Linux NFS4.1 client's "server trunking" seems to do the opposite of what the name implies

On Tue, Apr 20, 2021 at 4:59 PM guy keren <[email protected]> wrote:
>
>
> Hi,
>
> when attempting to make two NFS 4.1 mounts from a linux NFS client, to
> two IP addresses belonging to two different hosts in the same cluster
> (i.e. the server major id in the EXCHANGE_ID response is the same) - the
> linux NFS4.1 client discards the new TCP connection (to the 2nd IP) and
> instead decides to use the first client connection for both mounts. this
> seems to be handled in a hard-coded inside the function named
> "nfs41_discover_server_trunking", and leads to reduced performance,
> relative to using NFS3 (which will use two different TCP connections to
> the two different hosts in the storage cluster).
>
> i was under the impression that (client_id) trunking is supposed to
> allow to multiplex commands over multiple TCP connections - not to
> consolidate different workloads onto the same TCP connection.
>
> is there a way to avoid this behaviour, other then faking that the
> "server major id" is different on each node in the cluster? (this is
> what appears to be done by NetApp, for instance).

Hi Guy,

Current implementation of the linux client does not support session
trunking to the MDS (nor does it support client id trunking). I'm
hoping session trunking support comes in the near future. Clientid
trunking might not be something that's supported unless we'll have a
clustered NFS server out there that can utilize that behaviour.

Btw you can do multipath NFS flows by using the combination of
nconnect and the newly proposed sysfs interface (still in review) that
can manipulate server endpoints.


>
> thanks,
>
> --guy keren
>
> Vast Data.
>

2021-04-21 07:58:32

by guy keren

[permalink] [raw]
Subject: Re: Linux NFS4.1 client's "server trunking" seems to do the opposite of what the name implies

hi Olga, thanks for the response. more comments/questions below:

On 4/21/21 2:28 AM, Olga Kornievskaia wrote:
> On Tue, Apr 20, 2021 at 4:59 PM guy keren <[email protected]> wrote:
>> Hi,
>>
>> when attempting to make two NFS 4.1 mounts from a linux NFS client, to
>> two IP addresses belonging to two different hosts in the same cluster
>> (i.e. the server major id in the EXCHANGE_ID response is the same) - the
>> linux NFS4.1 client discards the new TCP connection (to the 2nd IP) and
>> instead decides to use the first client connection for both mounts. this
>> seems to be handled in a hard-coded inside the function named
>> "nfs41_discover_server_trunking", and leads to reduced performance,
>> relative to using NFS3 (which will use two different TCP connections to
>> the two different hosts in the storage cluster).
>>
>> i was under the impression that (client_id) trunking is supposed to
>> allow to multiplex commands over multiple TCP connections - not to
>> consolidate different workloads onto the same TCP connection.
>>
>> is there a way to avoid this behaviour, other then faking that the
>> "server major id" is different on each node in the cluster? (this is
>> what appears to be done by NetApp, for instance).
> Hi Guy,
>
> Current implementation of the linux client does not support session
> trunking to the MDS (nor does it support client id trunking). I'm
> hoping session trunking support comes in the near future. Clientid
> trunking might not be something that's supported unless we'll have a
> clustered NFS server out there that can utilize that behaviour.

i see.

> Btw you can do multipath NFS flows by using the combination of
> nconnect and the newly proposed sysfs interface (still in review) that
> can manipulate server endpoints.

the problem with nconnect is that although we will have multiple TCP
connections, they will all utilize the same session, which limits the
requests parallelism that can be achieved (since the slot table size is
the limiting factor for the number of in-flight commands).

the same problem will also exist with session trunking - while when
doing only client-id trunking (with a separate NFS4.1 session per TCP
connection) - the number of in-flight commands can be increased linearly
to the number of TCP connections.

is there any way to work around that?


p.s. is there anyone actively working on session trunking support, or is
it just a "future roadmap item: with no concrete plans?

thanks,

--guy keren


Vast Data.

2021-04-21 13:03:38

by Trond Myklebust

[permalink] [raw]
Subject: Re: Linux NFS4.1 client's "server trunking" seems to do the opposite of what the name implies

On Wed, 2021-04-21 at 10:42 +0300, guy keren wrote:
> hi Olga, thanks for the response. more comments/questions below:
>
> On 4/21/21 2:28 AM, Olga Kornievskaia wrote:
>  > On Tue, Apr 20, 2021 at 4:59 PM guy keren <[email protected]>
> wrote:
>  >> Hi,
>  >>
>  >> when attempting to make two NFS 4.1 mounts from a linux NFS
> client, to
>  >> two IP addresses belonging to two different hosts in the same
> cluster
>  >> (i.e. the server major id in the EXCHANGE_ID response is the
> same) - the
>  >> linux NFS4.1 client discards the new TCP connection (to the 2nd
> IP) and
>  >> instead decides to use the first client connection for both
> mounts. this
>  >> seems to be handled in a hard-coded inside the function named
>  >> "nfs41_discover_server_trunking", and leads to reduced
> performance,
>  >> relative to using NFS3 (which will use two different TCP
> connections to
>  >> the two different hosts in the storage cluster).
>  >>
>  >> i was under the impression that (client_id) trunking is supposed
> to
>  >> allow to multiplex commands over multiple TCP connections - not
> to
>  >> consolidate different workloads onto the same TCP connection.
>  >>
>  >> is there a way to avoid this behaviour, other then faking that
> the
>  >> "server major id" is different on each node in the cluster? (this
> is
>  >> what appears to be done by NetApp, for instance).
>  > Hi Guy,
>  >
>  > Current implementation of the linux client does not support
> session
>  > trunking to the MDS (nor does it support client id trunking). I'm
>  > hoping session trunking support comes in the near future. Clientid
>  > trunking might not be something that's supported unless we'll have
> a
>  > clustered NFS server out there that can utilize that behaviour.
>
> i see.
>
>  > Btw you can do multipath NFS flows by using the combination of
>  > nconnect and the newly proposed sysfs interface (still in review)
> that
>  > can manipulate server endpoints.
>
> the problem with nconnect is that although we will have multiple TCP
> requests parallelism that can be achieved (since the slot table size
> is
> the limiting factor for the number of in-flight commands).
>
> the same problem will also exist with session trunking - while when
> connection) - the number of in-flight commands can be increased
> linearly
> to the number of TCP connections.
>
> is there any way to work around that?
>
>

The Linux NFS client already supports dynamic slot allocation, and will
adjust its slot table size to match the values of sr_highest_slotid and
sr_target_highest_slotid. You can also recall slots using
CB_RECALL_SLOT in order to shrink the table size.

We consider this to be the right solution for scaling the number of
session slots, and are not considering implementing client id trunking.
The latter is a lot more onerous to manage for the client and does not
help solve the problem of flow control.

...and no, nobody promised anyone that performing a new mount would
magically increase the number of TCP connections available to existing
NFSv4 mounts. That's the reason why we're looking at Olga's sysfs
solution to add a proper control mechanism to allow dynamic
manipulation of the transports.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2021-04-21 13:25:44

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: Linux NFS4.1 client's "server trunking" seems to do the opposite of what the name implies

On Wed, Apr 21, 2021 at 3:28 AM guy keren <[email protected]> wrote:
>
>
> hi Olga, thanks for the response. more comments/questions below:
>
> On 4/21/21 2:28 AM, Olga Kornievskaia wrote:
>
> On Tue, Apr 20, 2021 at 4:59 PM guy keren <[email protected]> wrote:
>
> Hi,
>
> when attempting to make two NFS 4.1 mounts from a linux NFS client, to
> two IP addresses belonging to two different hosts in the same cluster
> (i.e. the server major id in the EXCHANGE_ID response is the same) - the
> linux NFS4.1 client discards the new TCP connection (to the 2nd IP) and
> instead decides to use the first client connection for both mounts. this
> seems to be handled in a hard-coded inside the function named
> "nfs41_discover_server_trunking", and leads to reduced performance,
> relative to using NFS3 (which will use two different TCP connections to
> the two different hosts in the storage cluster).
>
> i was under the impression that (client_id) trunking is supposed to
> allow to multiplex commands over multiple TCP connections - not to
> consolidate different workloads onto the same TCP connection.
>
> is there a way to avoid this behaviour, other then faking that the
> "server major id" is different on each node in the cluster? (this is
> what appears to be done by NetApp, for instance).
>
> Hi Guy,
>
> Current implementation of the linux client does not support session
> trunking to the MDS (nor does it support client id trunking). I'm
> hoping session trunking support comes in the near future. Clientid
> trunking might not be something that's supported unless we'll have a
> clustered NFS server out there that can utilize that behaviour.
>
>
> i see.
>
>
> Btw you can do multipath NFS flows by using the combination of
> nconnect and the newly proposed sysfs interface (still in review) that
> can manipulate server endpoints.
>
>
> the problem with nconnect is that although we will have multiple TCP connections, they will all utilize the same session, which limits the requests parallelism that can be achieved (since the slot table size is the limiting factor for the number of in-flight commands).
>
>
> the same problem will also exist with session trunking - while when doing only client-id trunking (with a separate NFS4.1 session per TCP connection) - the number of in-flight commands can be increased linearly to the number of TCP connections.
>
>
> is there any way to work around that?
>
>
>
> p.s. is there anyone actively working on session trunking support, or is it just a "future roadmap item: with no concrete plans?

It has been on my todo list but I've been having a hard time finding
enough time to focus on this. I do believe that having a proper
session trunking implementation (where the client discovers server's
session trunking abilities via FS_LOCATIONS) to achieve multipathing
is the way to go and sysfs was never the intention to provide
multipathing (at least not in my goals) but rather a way to manager
transports in situations we don't have a good way of dealing with
otherwise.

I think the current maximum number of concurrent v4.1+ requests in
1024, is that a too low of a number? Btw, I'm not aware of any servers
that can do more than that (but my server implementations knowledge is
limited). If that is so, then perhaps the focus should be on allowing
for a larger slot table?

>
>
> thanks,
>
> --guy keren
>
> Vast Data.