2008-04-16 02:58:59

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests

On Tue, Apr 15, 2008 at 09:43:10PM -0500, Tom Tucker wrote:
>
> On Tue, 2008-04-15 at 11:12 -0400, J. Bruce Fields wrote:
> > On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote:
> > >
> > > Maybe this this is a TCP_BACKLOG issue?
> >
> > So, looking around.... There seems to be a global limit in
> > /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth
> > seeing what happens if that's increased, e.g., with
> >
> > echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog
>
> I think this represents the collective total for all listening
> endpoints. I think we're only talking about mountd.

Yes.

> Shooting from the hip...
>
> My gray haired recollection is that the single connection default is a
> backlog of 10 (SYN received, not accepted connections). Additional SYN's
> received to this endpoint will be dropped...clients will retry the SYN
> as part of normal TCP retransmit...
>
> It might be that the CLOSE_WAIT's in the log are _normal_. That is, they
> reflect completed mount requests that are in the normal close path. If
> they never go away, then that's not normal. Is this the case?

What he said was:

"those fall over after some time and stay in CLOSE_WAIT state
until I restart the nfs-kernel-server."

Carsten, are you positive that the same sockets were in CLOSE_WAIT the
whole time you were watching? And how long was it before you gave up
and restarted?

> Suppose the 10 is roughly correct. The remaining "jilted" clients will
> retransmit their SYN after a randomized exponential backoff. I think you
> can imagine that trying 1300+ connections of which only 10 succeed and
> then retrying 1300-10 based on a randomized exponential backoff might
> get you some pretty bad performance.

Right, could be, but:

...
> > Oh, but: Grepping the glibc rpc code, it looks like it calls listen with
> > second argument SOMAXCONN == 128. You can confirm that by strace'ing
> > rpc.mountd -F and looking for the listen call.
> >
> > And that socket's shared between all the mountd processes, so I guess
> > that's the real limit. I don't see an easy way to adjust that. You'd
> > also need to increase /proc/sys/net/core/somaxconn first.
> >
> > But none of this explains why we'd see connections stuck in CLOSE_WAIT
> > indefinitely?

So the limit appears to be more like 128, and (based on my quick look at
the code) that appears to baked in to the glibc rpc code.

Maybe you could code around that in mountd. Looks like the relevant
code is in nfs-utils/support/include/rpcmisc.c:rpc_init().

--b.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs



2008-04-16 03:22:22

by Tom Tucker

[permalink] [raw]
Subject: Re: [NFS] How to set-up a Linux NFS server to handle massive number of requests


On Tue, 2008-04-15 at 22:58 -0400, J. Bruce Fields wrote:
> On Tue, Apr 15, 2008 at 09:43:10PM -0500, Tom Tucker wrote:
> >
> > On Tue, 2008-04-15 at 11:12 -0400, J. Bruce Fields wrote:
> > > On Mon, Apr 14, 2008 at 11:48:33PM -0500, Tom Tucker wrote:
> > > >
> > > > Maybe this this is a TCP_BACKLOG issue?
> > >
> > > So, looking around.... There seems to be a global limit in
> > > /proc/sys/net/ipv4/tcp_max_syn_backlog (default 1024?); might be worth
> > > seeing what happens if that's increased, e.g., with
> > >
> > > echo 2048 >/proc/sys/net/ipv4/tcp_max_syn_backlog
> >
> > I think this represents the collective total for all listening
> > endpoints. I think we're only talking about mountd.
>
> Yes.
>
> > Shooting from the hip...
> >
> > My gray haired recollection is that the single connection default is a
> > backlog of 10 (SYN received, not accepted connections). Additional SYN's
> > received to this endpoint will be dropped...clients will retry the SYN
> > as part of normal TCP retransmit...
> >
> > It might be that the CLOSE_WAIT's in the log are _normal_. That is, they
> > reflect completed mount requests that are in the normal close path. If
> > they never go away, then that's not normal. Is this the case?
>
> What he said was:
>
> "those fall over after some time and stay in CLOSE_WAIT state
> until I restart the nfs-kernel-server."
>
> Carsten, are you positive that the same sockets were in CLOSE_WAIT the
> whole time you were watching? And how long was it before you gave up
> and restarted?
>
> > Suppose the 10 is roughly correct. The remaining "jilted" clients will
> > retransmit their SYN after a randomized exponential backoff. I think you
> > can imagine that trying 1300+ connections of which only 10 succeed and
> > then retrying 1300-10 based on a randomized exponential backoff might
> > get you some pretty bad performance.
>
> Right, could be, but:
>
> ...
> > > Oh, but: Grepping the glibc rpc code, it looks like it calls listen with
> > > second argument SOMAXCONN == 128. You can confirm that by strace'ing
> > > rpc.mountd -F and looking for the listen call.
> > >
> > > And that socket's shared between all the mountd processes, so I guess
> > > that's the real limit. I don't see an easy way to adjust that. You'd
> > > also need to increase /proc/sys/net/core/somaxconn first.
> > >
> > > But none of this explains why we'd see connections stuck in CLOSE_WAIT
> > > indefinitely?
>
> So the limit appears to be more like 128, and (based on my quick look at
> the code) that appears to baked in to the glibc rpc code.
>
> Maybe you could code around that in mountd. Looks like the relevant
> code is in nfs-utils/support/include/rpcmisc.c:rpc_init().

If you really need to start 1300 mounts all at once then something needs
to change. BTW even after you get past mountd, the server is going to
get pounded with SYN and RPC_NOP.

It might be interesting to look at httpd (Apache) to see what it does. I
would think it faces similar traffic flows.

>
> --b.
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> NFS maillist - [email protected]
> https://lists.sourceforge.net/lists/listinfo/nfs
> _______________________________________________
> Please note that [email protected] is being discontinued.
> Please subscribe to [email protected] instead.
> http://vger.kernel.org/vger-lists.html#linux-nfs
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
_______________________________________________
Please note that [email protected] is being discontinued.
Please subscribe to [email protected] instead.
http://vger.kernel.org/vger-lists.html#linux-nfs