2022-02-16 07:03:16

by Patrick Goetz

[permalink] [raw]
Subject: How are client requests load balanced across multiple nfsd processes?

However inappropriate this question is, I'm asking anyway, as this has
been driving me nuts for 2 weeks and I can't find an answer.

When I set

RPCNFSDCOUNT=16

what I thought this did was spin up an nfsd thread master with 15
threads and the thread master would pass out client requests to the
threads, but looking at /proc/$PID/status -> TGID clearly indicates
these are all entirely separate processes. (I wasn't sure if ps showed
threads as separate processes; apparently it doesn't.)

So the question is how do different client requests get farmed out to
different nfsd daemons for service? Who's actually listening on port 2049?

If there's a reference other than the source code where I can read up on
this, I'm interested. I looked in a couple of linux programming books
including Richard Stevens and couldn't find what I was looking for.

This was all prompted by some vendor trying to sell me an EC (Erasure
Coding) n+m system who commented "NFS isn't multi-threaded, NFS can only
communicate with one server, for a shared/mounted filesystem, so it will
always be limited to the speed of that NFS Server. POSIX/Multi-threaded
means the filesystem is parallel and can be reading/writing to multiple
nodes at once in a storage cluster/setup. The opposite of NFS."

I think pNFS addresses this, but then how does one implement pNFS?



2022-02-16 19:53:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: How are client requests load balanced across multiple nfsd processes?

On Tue, Feb 15, 2022 at 04:13:25PM -0600, Patrick Goetz wrote:
> When I set
>
> RPCNFSDCOUNT=16
>
> what I thought this did was spin up an nfsd thread master with 15
> threads and the thread master would pass out client requests to the
> threads, but looking at /proc/$PID/status -> TGID clearly indicates
> these are all entirely separate processes. (I wasn't sure if ps
> showed threads as separate processes; apparently it doesn't.)

They're all kernel tasks, which makes the distinction between "thread"
and "process" a little vague.

> So the question is how do different client requests get farmed out
> to different nfsd daemons for service? Who's actually listening on
> port 2049?

There's no user process that calls "listen"; knfsd's normal rpc handling
is all in-kernel. Incoming rpc's may be handed to any of those 16 tasks
for processing. A single task just runs a loop where it receives an
rpc, handles it, and sends a response back.

> This was all prompted by some vendor trying to sell me an EC
> (Erasure Coding) n+m system who commented "NFS isn't multi-threaded,
> NFS can only communicate with one server, for a shared/mounted
> filesystem, so it will always be limited to the speed of that NFS
> Server. POSIX/Multi-threaded means the filesystem is parallel and
> can be reading/writing to multiple nodes at once in a storage
> cluster/setup. The opposite of NFS."

That explanation is a little muddled. NFS clients and servers both
typically have lots of parallelism. Whether it's sufficient for your
purposes depends on exactly what you need.

But, yes, they're mostly correct to say that, in the absence of pNFS,
"NFS can only communicate with one server, for a shared/mounted
filesystem, so it will always be limited to the speed of that NFS
Server".

> I think pNFS addresses this, but then how does one implement pNFS?

So, right, pNFS can let you perform IO to multiple servers
simultaneously, if that's what you need.

The Linux NFS client has support for pNFS, but the kernel server
doesn't, so you'd need to look elsewhere for a pNFS server.

Whether any of this is useful to you depends on exactly what problem
you're trying to solve.

--b.

2022-02-16 20:59:01

by J. Bruce Fields

[permalink] [raw]
Subject: Re: How are client requests load balanced across multiple nfsd processes?

On Wed, Feb 16, 2022 at 01:33:55PM -0600, Patrick Goetz wrote:
> On 2/16/22 13:22, J. Bruce Fields wrote:
> >There's no user process that calls "listen"; knfsd's normal rpc handling
> >is all in-kernel. Incoming rpc's may be handed to any of those 16 tasks
> >for processing. A single task just runs a loop where it receives an
> >rpc, handles it, and sends a response back.
> >
>
> How does knfsd decide what user space nfsd process to hand a task
> off to?

To be clear, knfsd tasks never run in userspace at all.

> Is it random, round robin, or something more sophisticated?

It's complicated, and I'd have to look at the code. It's an
implementation detail that nobody should have to depend on.

> Or does it even matter if nfsd is only handling one request at a
> time anyway?

If you're running with 16 threads, then it can be (oversimplifying a
bit) handling up to 16 requests at a time.

--b.

2022-02-17 07:51:17

by Patrick Goetz

[permalink] [raw]
Subject: Re: How are client requests load balanced across multiple nfsd processes?

Thanks, Bruce. As always your responses are super informative. Just
one follow up question based strictly on curiosity:

On 2/16/22 13:22, J. Bruce Fields wrote:
> On Tue, Feb 15, 2022 at 04:13:25PM -0600, Patrick Goetz wrote:
>> When I set
>>
>> RPCNFSDCOUNT=16
>>
>> what I thought this did was spin up an nfsd thread master with 15
>> threads and the thread master would pass out client requests to the
>> threads, but looking at /proc/$PID/status -> TGID clearly indicates
>> these are all entirely separate processes. (I wasn't sure if ps
>> showed threads as separate processes; apparently it doesn't.)
>
> They're all kernel tasks, which makes the distinction between "thread"
> and "process" a little vague.
>
>> So the question is how do different client requests get farmed out
>> to different nfsd daemons for service? Who's actually listening on
>> port 2049?
>
> There's no user process that calls "listen"; knfsd's normal rpc handling
> is all in-kernel. Incoming rpc's may be handed to any of those 16 tasks
> for processing. A single task just runs a loop where it receives an
> rpc, handles it, and sends a response back.
>

How does knfsd decide what user space nfsd process to hand a task off
to? Is it random, round robin, or something more sophisticated? Or does
it even matter if nfsd is only handling one request at a time anyway?


>> This was all prompted by some vendor trying to sell me an EC
>> (Erasure Coding) n+m system who commented "NFS isn't multi-threaded,
>> NFS can only communicate with one server, for a shared/mounted
>> filesystem, so it will always be limited to the speed of that NFS
>> Server. POSIX/Multi-threaded means the filesystem is parallel and
>> can be reading/writing to multiple nodes at once in a storage
>> cluster/setup. The opposite of NFS."
>
> That explanation is a little muddled. NFS clients and servers both
> typically have lots of parallelism. Whether it's sufficient for your
> purposes depends on exactly what you need.
>
> But, yes, they're mostly correct to say that, in the absence of pNFS,
> "NFS can only communicate with one server, for a shared/mounted
> filesystem, so it will always be limited to the speed of that NFS
> Server".
>
>> I think pNFS addresses this, but then how does one implement pNFS?
>
> So, right, pNFS can let you perform IO to multiple servers
> simultaneously, if that's what you need.
>
> The Linux NFS client has support for pNFS, but the kernel server
> doesn't, so you'd need to look elsewhere for a pNFS server.
>
> Whether any of this is useful to you depends on exactly what problem
> you're trying to solve.
>
> --b.