2008-06-19 13:33:41

by Olga Kornievskaia

[permalink] [raw]
Subject: [PATCH] NFSD: fix use of setsockopt

The following patch fixes NFS server's use of setsockopt. For this
function to take an effect it first needs be called after socket
creation but before sock binding.

This patch also changes the size of the receive sock buffer to be same
as the send sock buffer. Both buffers are now a multiple of maxpayload
and number of nfsd threads.

This patch fixes the problem that receive window never opens beyond the
default TCP receive window size set by the 2nd parameter of the
net.ipv4.tcp_rmem sysctl.

Signed-off-by: Olga Kornievskaia <[email protected]>


Attachments:
nfsd-fix-sockopt-7.patch (1.21 kB)

2008-06-25 00:50:35

by Dean

[permalink] [raw]
Subject: Re: [PATCH] NFSD: fix use of setsockopt

Hi Olga,

This makes sense, if NFSD is going to ignore global Linux TCP settings
and 'go it alone', then it shouldn't be constrained by them.

At least now we can increase the rcv buffer size by increasing the
number of NFSDs. I would still like to pursue my sysctl patch for the
rcv and snd buffer though since we have seen situations where too many
NFSDs can increase the randomness of requests to the underlying file
system, reducing the effectiveness readahead/write gathering.

Dean

Olga Kornievskaia wrote:
> The following patch fixes NFS server's use of setsockopt. For this
> function to take an effect it first needs be called after socket
> creation but before sock binding.
>
> This patch also changes the size of the receive sock buffer to be same
> as the send sock buffer. Both buffers are now a multiple of maxpayload
> and number of nfsd threads.
>
> This patch fixes the problem that receive window never opens beyond
> the default TCP receive window size set by the 2nd parameter of the
> net.ipv4.tcp_rmem sysctl.
>
> Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Olga Kornievskaia <[email protected]>

------------------------------------------------------------------------

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index c75bffe..178b397 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
*/
svc_sock_setbufsize(svsk->sk_sock,
(serv->sv_nrthreads+3) * serv->sv_max_mesg,
- 3 * serv->sv_max_mesg);
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);

clear_bit(SK_DATA, &svsk->sk_flags);

@@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
* receive and respond to one request.
* svc_tcp_recvfrom will re-adjust if necessary
*/
- svc_sock_setbufsize(svsk->sk_sock,
- 3 * svsk->sk_server->sv_max_mesg,
- 3 * svsk->sk_server->sv_max_mesg);
-
- set_bit(SK_CHNGBUF, &svsk->sk_flags);

set_bit(SK_DATA, &svsk->sk_flags);
if (sk->sk_state != TCP_ESTABLISHED)
set_bit(SK_CLOSE, &svsk->sk_flags);
@@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,

if (type == SOCK_STREAM)
sock->sk->sk_reuse = 1; /* allow address reuse */
+ svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
+ (serv->sv_nrthreads+3) * serv->sv_max_mesg);
error = kernel_bind(sock, sin, len);
if (error < 0)
goto bummer;


2008-06-25 19:37:58

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] NFSD: fix use of setsockopt

On Thu, Jun 19, 2008 at 09:33:39AM -0400, Olga Kornievskaia wrote:
> The following patch fixes NFS server's use of setsockopt. For this
> function to take an effect it first needs be called after socket
> creation but before sock binding.

The tcp(7) man page actually claims that it's listen() and connect()
that matter (so the setsockopt is effective on (and only on) unconnected
sockets), so probably this could go after the bind and before the
listen? Not that it matters.

> This patch also changes the size of the receive sock buffer to be same
> as the send sock buffer. Both buffers are now a multiple of maxpayload
> and number of nfsd threads.

It would be nice if we could get some review from someone who remembers
what the justification for the smaller receive buffer size was (Neil?).

> This patch fixes the problem that receive window never opens beyond the
> default TCP receive window size set by the 2nd parameter of the
> net.ipv4.tcp_rmem sysctl.

Do you know what it does in the udp case?

--b.

>
> Signed-off-by: Olga Kornievskaia <[email protected]>

> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index c75bffe..178b397 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> */
> svc_sock_setbufsize(svsk->sk_sock,
> (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - 3 * serv->sv_max_mesg);
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>
> clear_bit(SK_DATA, &svsk->sk_flags);
>
> @@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
> * receive and respond to one request.
> * svc_tcp_recvfrom will re-adjust if necessary
> */
> - svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_server->sv_max_mesg,
> - 3 * svsk->sk_server->sv_max_mesg);
> -
> - set_bit(SK_CHNGBUF, &svsk->sk_flags);
> set_bit(SK_DATA, &svsk->sk_flags);
> if (sk->sk_state != TCP_ESTABLISHED)
> set_bit(SK_CLOSE, &svsk->sk_flags);
> @@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
>
> if (type == SOCK_STREAM)
> sock->sk->sk_reuse = 1; /* allow address reuse */
> + svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
> error = kernel_bind(sock, sin, len);
> if (error < 0)
> goto bummer;


2008-06-25 19:40:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] NFSD: fix use of setsockopt

On Tue, Jun 24, 2008 at 05:50:31PM -0700, Dean Hildebrand wrote:
> Hi Olga,
>
> This makes sense, if NFSD is going to ignore global Linux TCP settings
> and 'go it alone', then it shouldn't be constrained by them.
>
> At least now we can increase the rcv buffer size by increasing the
> number of NFSDs. I would still like to pursue my sysctl patch for the
> rcv and snd buffer though since we have seen situations where too many
> NFSDs can increase the randomness of requests to the underlying file
> system, reducing the effectiveness readahead/write gathering.

Olga says she's also seeing some performance decrease with increasing
numbers of threads in our 10 gigabit testing, and I was wondering if
something like that could explain the change.

Anyone have ideas how we could measure how ordered our IO requests are?
(Or how much seeking the drives in our raid array are doing?)

--b.

>
> Dean
>
> Olga Kornievskaia wrote:
>> The following patch fixes NFS server's use of setsockopt. For this
>> function to take an effect it first needs be called after socket
>> creation but before sock binding.
>>
>> This patch also changes the size of the receive sock buffer to be same
>> as the send sock buffer. Both buffers are now a multiple of maxpayload
>> and number of nfsd threads.
>>
>> This patch fixes the problem that receive window never opens beyond
>> the default TCP receive window size set by the 2nd parameter of the
>> net.ipv4.tcp_rmem sysctl.
>>
>> Signed-off-by: Olga Kornievskaia <[email protected]>
> Signed-off-by: Olga Kornievskaia <[email protected]>
>
> ------------------------------------------------------------------------
>
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index c75bffe..178b397 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
> */
> svc_sock_setbufsize(svsk->sk_sock,
> (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> - 3 * serv->sv_max_mesg);
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>
> clear_bit(SK_DATA, &svsk->sk_flags);
>
> @@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
> * receive and respond to one request.
> * svc_tcp_recvfrom will re-adjust if necessary
> */
> - svc_sock_setbufsize(svsk->sk_sock,
> - 3 * svsk->sk_server->sv_max_mesg,
> - 3 * svsk->sk_server->sv_max_mesg);
> -
> - set_bit(SK_CHNGBUF, &svsk->sk_flags);
>
> set_bit(SK_DATA, &svsk->sk_flags);
> if (sk->sk_state != TCP_ESTABLISHED)
> set_bit(SK_CLOSE, &svsk->sk_flags);
> @@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
>
> if (type == SOCK_STREAM)
> sock->sk->sk_reuse = 1; /* allow address reuse */
> + svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
> error = kernel_bind(sock, sin, len);
> if (error < 0)
> goto bummer;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-06-25 20:44:19

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [PATCH] NFSD: fix use of setsockopt



J. Bruce Fields wrote:
> On Thu, Jun 19, 2008 at 09:33:39AM -0400, Olga Kornievskaia wrote:
>
>> The following patch fixes NFS server's use of setsockopt. For this
>> function to take an effect it first needs be called after socket
>> creation but before sock binding.
>>
>
> The tcp(7) man page actually claims that it's listen() and connect()
> that matter (so the setsockopt is effective on (and only on) unconnected
> sockets), so probably this could go after the bind and before the
> listen? Not that it matters.
>
>
>> This patch also changes the size of the receive sock buffer to be same
>> as the send sock buffer. Both buffers are now a multiple of maxpayload
>> and number of nfsd threads.
>>
>
> It would be nice if we could get some review from someone who remembers
> what the justification for the smaller receive buffer size was (Neil?).
>
>
>> This patch fixes the problem that receive window never opens beyond the
>> default TCP receive window size set by the 2nd parameter of the
>> net.ipv4.tcp_rmem sysctl.
>>
>
> Do you know what it does in the udp case?
>
Looking at the kernel code, when setsockopt() is called on a UDP socket
to set send/receive buffer for UPD the code will not do anything:
udp_setsockopt() and udp_lib_setsockopt() will return -ENOPROTOOPT.
However, we bypass the call to setsockopt() and instead set the buffer
sizes directly. From what I understand sk_sndbuf/sk_rcvbuf are not used
by the UDP code. We are setting the fields that are never used.

Then perhaps we can remove calls to svc_sock_setbufsize() from
svc_udp_init() and svc_udp_recvfrom()?
> --b.
>
>
>> Signed-off-by: Olga Kornievskaia <[email protected]>
>>
>
>
>> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
>> index c75bffe..178b397 100644
>> --- a/net/sunrpc/svcsock.c
>> +++ b/net/sunrpc/svcsock.c
>> @@ -1191,7 +1191,7 @@ svc_tcp_recvfrom(struct svc_rqst *rqstp)
>> */
>> svc_sock_setbufsize(svsk->sk_sock,
>> (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>> - 3 * serv->sv_max_mesg);
>> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>>
>> clear_bit(SK_DATA, &svsk->sk_flags);
>>
>> @@ -1372,11 +1372,6 @@ svc_tcp_init(struct svc_sock *svsk)
>> * receive and respond to one request.
>> * svc_tcp_recvfrom will re-adjust if necessary
>> */
>> - svc_sock_setbufsize(svsk->sk_sock,
>> - 3 * svsk->sk_server->sv_max_mesg,
>> - 3 * svsk->sk_server->sv_max_mesg);
>> -
>> - set_bit(SK_CHNGBUF, &svsk->sk_flags);
>> set_bit(SK_DATA, &svsk->sk_flags);
>> if (sk->sk_state != TCP_ESTABLISHED)
>> set_bit(SK_CLOSE, &svsk->sk_flags);
>> @@ -1761,6 +1756,8 @@ static int svc_create_socket(struct svc_serv *serv, int protocol,
>>
>> if (type == SOCK_STREAM)
>> sock->sk->sk_reuse = 1; /* allow address reuse */
>> + svc_sock_setbufsize(sock, (serv->sv_nrthreads+3) * serv->sv_max_mesg,
>> + (serv->sv_nrthreads+3) * serv->sv_max_mesg);
>> error = kernel_bind(sock, sin, len);
>> if (error < 0)
>> goto bummer;
>>
>
>
>

2008-06-26 17:56:09

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] NFSD: fix use of setsockopt

On Wed, Jun 25, 2008 at 04:44:15PM -0400, Olga Kornievskaia wrote:
> Looking at the kernel code, when setsockopt() is called on a UDP socket
> to set send/receive buffer for UPD the code will not do anything:
> udp_setsockopt() and udp_lib_setsockopt() will return -ENOPROTOOPT.
> However, we bypass the call to setsockopt() and instead set the buffer
> sizes directly. From what I understand sk_sndbuf/sk_rcvbuf are not used
> by the UDP code. We are setting the fields that are never used.
>
> Then perhaps we can remove calls to svc_sock_setbufsize() from
> svc_udp_init() and svc_udp_recvfrom()?

Assuming you're correct about udp not using those fields (haven't
checked myself)--yes, that'd be great.

--b.

2008-06-26 18:09:12

by Laurenz, Dirk

[permalink] [raw]
Subject: Sm-notify

Hi,

Does anybody know how sm-notify exactly works on a suse sles 9 system?

Greetings,

Dirk

2008-06-27 02:23:04

by NeilBrown

[permalink] [raw]
Subject: Re: Sm-notify

On Fri, June 27, 2008 3:59 am, Laurenz, Dirk wrote:
> Hi,
>
> Does anybody know how sm-notify exactly works on a suse sles 9 system?
>

Yes.

Do you have a specific question about it?

NeilBrown


2008-06-27 07:08:08

by Oeltze, Benjamin

[permalink] [raw]
Subject: RE: Sm-notify

Hi,
maybe I can specify Dirks question.

We are trying to set up a NFS Cluster and want to use sm-notify to info=
rm the Clients if the Server has failed over.
We have found a Howto that says: sm-notify uses files under /var/lib/nf=
s/sm to determine the Clients it has to inform.
Our Problem is, that ther are no files in this place even if we connet =
with a client. Is there anything we have forgotten?

Mit freundlichen Gr=FC=DFen
Benjamin Oeltze
Systems Engineer
DE ISC SOP PS N/O
=46ujitsu Siemens Computers
Hildesheimer Str. 25
30880 Laatzen
Telephone: 0511-8489 1872
Mobile: 0160-96354617
Email: mailto: benjamin.oeltze-/ixSogHR0HOS/[email protected]
Internet: http://www.fujitsu-siemens.com
=46irmenangaben: http://www.fujitsu-siemens.de/imprint.html

-----Original Message-----
=46rom: NeilBrown [mailto:[email protected]]
Sent: Friday, June 27, 2008 4:23 AM
To: Laurenz, Dirk
Cc: [email protected]; Oeltze, Benjamin
Subject: Re: Sm-notify

On Fri, June 27, 2008 3:59 am, Laurenz, Dirk wrote:
> Hi,
>
> Does anybody know how sm-notify exactly works on a suse sles 9 system=
?
>

Yes.

Do you have a specific question about it?

NeilBrown