2019-07-03 01:36:17

by Paul Menzel

[permalink] [raw]
Subject: Regression caused by commit c54f24e3 (nfsd: fix performance-limiting session calculation)

Dear Bruce,


Could it be that commit c54f24e3 (nfsd: fix performance-limiting session
calculation) causes a regression on big memory machines (1 TB)?

> From c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 Mon Sep 17 00:00:00 2001
> From: "J. Bruce Fields" <[email protected]>
> Date: Thu, 21 Feb 2019 10:47:00 -0500
> Subject: [PATCH] nfsd: fix performance-limiting session calculation
>
> We're unintentionally limiting the number of slots per nfsv4.1 session
> to 10. Often more than 10 simultaneous RPCs are needed for the best
> performance.
>
> This calculation was meant to prevent any one client from using up more
> than a third of the limit we set for total memory use across all clients
> and sessions. Instead, it's limiting the client to a third of the
> maximum for a single session.
>
> Fix this.
>
> Reported-by: Chris Tracy <[email protected]>
> Cc: [email protected]
> Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches"
> Signed-off-by: J. Bruce Fields <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fb3c9844c82a..6a45fb00c5fc 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1544,16 +1544,16 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
> {
> u32 slotsize = slot_bytes(ca);
> u32 num = ca->maxreqs;
> - int avail;
> + unsigned long avail, total_avail;
>
> spin_lock(&nfsd_drc_lock);
> - avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION,
> - nfsd_drc_max_mem - nfsd_drc_mem_used);
> + total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
> + avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
> /*
> * Never use more than a third of the remaining memory,
> * unless it's the only way to give this client a slot:
> */
> - avail = clamp_t(int, avail, slotsize, avail/3);
> + avail = clamp_t(int, avail, slotsize, total_avail/3);
> num = min_t(int, num, avail / slotsize);
> nfsd_drc_mem_used += num * slotsize;
> spin_unlock(&nfsd_drc_lock);

Booting a 80 threads, 1 TB server with Linux 4.19.56 and Linux 5.2-rc7
causes connections problems for the clients. The problems do not happen
on servers with just 96 GB memory for example. Bisecting points to the
two commits below (and I can only continue tomorrow).

c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 (nfsd: fix performance-limiting
session calculation)
8127d82705998568b52ac724e28e00941538083d (NFS: Don't recoalesce on error
in nfs_pageio_complete_mirror())

If you have things I could do to verify this besides reverting it
tomorrow, please tell. It’d be great if it could be fixed before Linux
5.2 is released.


Kind regards,

Paul


2019-07-03 09:57:17

by Paul Menzel

[permalink] [raw]
Subject: Re: Regression caused by commit c54f24e3 (nfsd: fix performance-limiting session calculation)

Dear Bruce,


On 7/2/19 11:59 PM, Paul Menzel wrote:

> Could it be that commit c54f24e3 (nfsd: fix performance-limiting
> session calculation) causes a regression on big memory machines (1
> TB)?
>
>> From c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 Mon Sep 17 00:00:00 2001
>> From: "J. Bruce Fields" <[email protected]>
>> Date: Thu, 21 Feb 2019 10:47:00 -0500
>> Subject: [PATCH] nfsd: fix performance-limiting session calculation
>>
>> We're unintentionally limiting the number of slots per nfsv4.1 session
>> to 10.  Often more than 10 simultaneous RPCs are needed for the best
>> performance.
>>
>> This calculation was meant to prevent any one client from using up more
>> than a third of the limit we set for total memory use across all clients
>> and sessions.  Instead, it's limiting the client to a third of the
>> maximum for a single session.
>>
>> Fix this.
>>
>> Reported-by: Chris Tracy <[email protected]>
>> Cc: [email protected]
>> Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches"
>> Signed-off-by: J. Bruce Fields <[email protected]>
>> ---
>>  fs/nfsd/nfs4state.c | 8 ++++----
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index fb3c9844c82a..6a45fb00c5fc 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -1544,16 +1544,16 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
>>  {
>>      u32 slotsize = slot_bytes(ca);
>>      u32 num = ca->maxreqs;
>> -    int avail;
>> +    unsigned long avail, total_avail;
>>  
>>      spin_lock(&nfsd_drc_lock);
>> -    avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION,
>> -            nfsd_drc_max_mem - nfsd_drc_mem_used);
>> +    total_avail = nfsd_drc_max_mem - nfsd_drc_mem_used;
>> +    avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION, total_avail);
>>      /*
>>       * Never use more than a third of the remaining memory,
>>       * unless it's the only way to give this client a slot:
>>       */
>> -    avail = clamp_t(int, avail, slotsize, avail/3);
>> +    avail = clamp_t(int, avail, slotsize, total_avail/3);
>>      num = min_t(int, num, avail / slotsize);
>>      nfsd_drc_mem_used += num * slotsize;
>>      spin_unlock(&nfsd_drc_lock);
>
> Booting a 80 threads, 1 TB server with Linux 4.19.56 and Linux
> 5.2-rc7 causes connections problems for the clients. The problems do
> not happen on servers with just 96 GB memory for example. Bisecting
> points to the two commits below (and I can only continue tomorrow).
>
> c54f24e338ed2a35218f117a4a1afb5f9e2b4e64 (nfsd: fix performance-limiting session calculation)
> 8127d82705998568b52ac724e28e00941538083d (NFS: Don't recoalesce on error in nfs_pageio_complete_mirror())
>
> If you have things I could do to verify this besides reverting it
> tomorrow, please tell. It’d be great if it could be fixed before Linux
> 5.2 is released.

Reverting the suspected commit c54f24e3 indeed fixes the regression.
Please tell me how to continue. Should the commit be reverted or can
a fix be made shortly?

On a running system, `rpcdebug -m nfsd -s all` and
`rpcdebug -m rpc -s all` do not give a lot of information, as the
interesting values are set during module initialization.


Kind regards,

Paul


[ 765.444765] svc: socket 0000000093976859 TCP (listen) state change 10
[ 765.444784] svc: socket 0000000015376384 TCP (listen) state change 1
[ 765.444869] svc: tcp_accept 000000002ad76b70 sock 000000005eae2611
[ 765.444879] nfsd: connect from 141.14.17.51, port=785
[ 765.444881] svc: svc_setup_socket 000000003fe99ad1
[ 765.444883] setting up TCP socket for reading
[ 765.444885] svc: svc_setup_socket created 000000006952397a (inet 0000000015376384), listen 0 close 0
[ 765.444901] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.444904] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 765.444907] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.444912] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 765.444913] svc: TCP record, 40 bytes
[ 765.444919] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 40
[ 765.444920] svc: TCP final record (40 bytes)
[ 765.444926] svc: svc_authenticate (0)
[ 765.444930] svc: calling dispatcher
[ 765.444931] nfsd_dispatch: vers 4 proc 0
[ 765.444954] svc: socket 000000006952397a sendto([000000005133ae9c 28... ], 28) = 28 (addr 141.14.17.51, port=785)
[ 765.444963] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.444967] svc: tcp_accept 000000002ad76b70 sock 000000005eae2611
[ 765.444975] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.444980] svc: server 00000000be144e7a, pool 0, transport 000000006952397a, inuse=2
[ 765.444983] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.444988] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 765.444989] RPC: TCP recv_record got -11
[ 765.444990] RPC: TCP recvfrom got EAGAIN
[ 765.444993] svc: server 00000000be144e7a waiting for data (to = 3600000)
[ 765.445020] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[ 765.445023] svc: server 0000000011abceff waiting for data (to = 3600000)
[ 765.445177] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[ 765.445401] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 765.445411] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.445414] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 765.445415] svc: TCP record, 252 bytes
[ 765.445418] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 252
[ 765.445419] svc: TCP final record (252 bytes)
[ 765.445427] svc: svc_authenticate (1)
[ 765.445431] svc: calling dispatcher
[ 765.445432] nfsd_dispatch: vers 4 proc 1
[ 765.445436] nfsd4_exchange_id rqstp=00000000aad03597 exid=000000000472f450 clname.len=32 clname.data=00000000ed18490f ip_addr=141.14.17.51 flags 101, spa_how 0
[ 765.445442] renewing client (clientid 5d1c72c1/ed57da3d)
[ 765.445443] nfsd4_exchange_id seqid 0 flags 10001
[ 765.445444] nfsv4 compound returned 0
[ 765.445458] svc: socket 000000006952397a sendto([00000000d62d9bb6 136... ], 136) = 136 (addr 141.14.17.51, port=785)
[ 765.445460] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.445462] svc: server 000000002b852bb6, pool 0, transport 000000006952397a, inuse=2
[ 765.445464] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.445467] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 765.445472] RPC: TCP recv_record got -11
[ 765.445473] RPC: TCP recvfrom got EAGAIN
[ 765.445475] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[ 765.445682] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[ 765.445760] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 765.445762] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.445765] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 765.445769] svc: TCP record, 252 bytes
[ 765.445773] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 252
[ 765.445774] svc: TCP final record (252 bytes)
[ 765.445779] svc: svc_authenticate (1)
[ 765.445784] svc: calling dispatcher
[ 765.445785] nfsd_dispatch: vers 4 proc 1
[ 765.445789] nfsd4_exchange_id rqstp=00000000aad03597 exid=000000000472f450 clname.len=32 clname.data=00000000ed18490f ip_addr=141.14.17.51 flags 101, spa_how 0
[ 765.445795] renewing client (clientid 5d1c72c1/ed57da3e)
[ 765.445795] nfsd4_exchange_id seqid 0 flags 10001
[ 765.445854] svc: server 000000002b852bb6, pool 0, transport 000000006952397a, inuse=3
[ 765.445855] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.445859] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 765.445860] RPC: TCP recv_record got -11
[ 765.445860] RPC: TCP recvfrom got EAGAIN
[ 765.445862] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[ 765.445995] nfsv4 compound returned 0
[ 765.446013] svc: socket 000000006952397a sendto([000000006f8e68ac 136... ], 136) = 136 (addr 141.14.17.51, port=785)
[ 765.446020] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.446178] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[ 765.446193] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 765.446198] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.446200] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 765.446201] svc: TCP record, 220 bytes
[ 765.446205] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[ 765.446222] svc: TCP final record (220 bytes)
[ 765.446226] svc: svc_authenticate (1)
[ 765.446229] svc: calling dispatcher
[ 765.446230] nfsd_dispatch: vers 4 proc 1
[ 765.446233] nfsv4 compound returned 10008
[ 765.446243] svc: socket 000000006952397a sendto([00000000dda22c57 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[ 765.446244] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.446250] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 765.446251] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 765.446253] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 765.446253] RPC: TCP recv_record got -11
[ 765.446254] RPC: TCP recvfrom got EAGAIN
[ 765.446257] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 765.446397] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[ 766.500412] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[ 766.500445] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 766.500448] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 766.500452] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 766.500454] svc: TCP record, 220 bytes
[ 766.500459] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[ 766.500461] svc: TCP final record (220 bytes)
[ 766.500467] svc: svc_authenticate (1)
[ 766.500472] svc: calling dispatcher
[ 766.500474] nfsd_dispatch: vers 4 proc 1
[ 766.500480] nfsv4 compound returned 10008
[ 766.500498] svc: socket 000000006952397a sendto([000000006f8e68ac 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[ 766.500505] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 766.500509] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 766.500512] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 766.500515] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 766.500516] RPC: TCP recv_record got -11
[ 766.500517] RPC: TCP recvfrom got EAGAIN
[ 766.500519] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 766.500546] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[ 767.524372] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[ 767.524405] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 767.524408] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 767.524413] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 767.524415] svc: TCP record, 220 bytes
[ 767.524420] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[ 767.524422] svc: TCP final record (220 bytes)
[ 767.524428] svc: svc_authenticate (1)
[ 767.524434] svc: calling dispatcher
[ 767.524435] nfsd_dispatch: vers 4 proc 1
[ 767.524441] nfsv4 compound returned 10008
[ 767.524448] svc: server 000000002b852bb6, pool 0, transport 000000006952397a, inuse=3
[ 767.524451] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 767.524470] svc: socket 000000006952397a sendto([00000000dda22c57 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[ 767.524474] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 767.524476] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 767.524477] RPC: TCP recv_record got -11
[ 767.524478] RPC: TCP recvfrom got EAGAIN
[ 767.524482] svc: server 000000002b852bb6 waiting for data (to = 3600000)
[ 768.548339] svc: socket 000000006952397a(inet 0000000015376384), busy=0
[ 768.548372] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 768.548375] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 768.548379] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = 4
[ 768.548381] svc: TCP record, 220 bytes
[ 768.548386] svc: socket 000000006952397a recvfrom(00000000bce6ffde, 4096) = 220
[ 768.548387] svc: TCP final record (220 bytes)
[ 768.548394] svc: svc_authenticate (1)
[ 768.548400] svc: calling dispatcher
[ 768.548401] nfsd_dispatch: vers 4 proc 1
[ 768.548407] nfsv4 compound returned 10008
[ 768.548425] svc: socket 000000006952397a sendto([000000006f8e68ac 48... ], 48) = 48 (addr 141.14.17.51, port=785)
[ 768.548432] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 768.548436] svc: server 00000000aad03597, pool 0, transport 000000006952397a, inuse=2
[ 768.548439] svc: tcp_recv 000000006952397a data 1 conn 0 close 0
[ 768.548442] svc: socket 000000006952397a recvfrom(000000005fae69c4, 4) = -11
[ 768.548443] RPC: TCP recv_record got -11
[ 768.548444] RPC: TCP recvfrom got EAGAIN
[ 768.548446] svc: server 00000000aad03597 waiting for data (to = 3600000)
[ 768.548477] svc: server 000000002b852bb6 waiting for data (to = 3600000)


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-07-03 12:55:05

by Paul Menzel

[permalink] [raw]
Subject: [PATCH] nfsd: Fix overflow causing non-working mounts on 1 TB machines

Date: Wed, 3 Jul 2019 13:28:15 +0200

Since commit 10a68cdf10 (nfsd: fix performance-limiting session
calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with
1 TB of memory cannot be mounted anymore. The mount just hangs on the
client.

The gist of commit 10a68cdf10 is the change below.

-avail = clamp_t(int, avail, slotsize, avail/3);
+avail = clamp_t(int, avail, slotsize, total_avail/3);

Here are the macros.

#define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
#define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi)

`total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts
the values to `int`, which for 32-bit integers can only hold values
−2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1).

`avail` (in the function signature) is just 65536, so that no overflow
was happening. Before the commit the assignment would result in 21845,
and `num = 4`.

When using `total_avail`, it is causing the assignment to be
18446744072226137429 (printed as %lu), and `num` is then 4164608182.

My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the
server thinks there is no memory available any more for this client.

Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long`
fixes the issue.

Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but
`num = 4` remains the same.

Fixes: 10a68cdf10 (nfsd: fix performance-limiting session calculation)
Cc: [email protected]
Signed-off-by: Paul Menzel <[email protected]>
---

1. No, idea if `min_t()` arguments also need updating.
2. Instead of `unsigned long`, should `size_t` be used?

fs/nfsd/nfs4state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 618e66078ee5..1a0cdeb3b875 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1563,7 +1563,7 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
* Never use more than a third of the remaining memory,
* unless it's the only way to give this client a slot:
*/
- avail = clamp_t(int, avail, slotsize, total_avail/3);
+ avail = clamp_t(unsigned long, avail, slotsize, total_avail/3);
num = min_t(int, num, avail / slotsize);
nfsd_drc_mem_used += num * slotsize;
spin_unlock(&nfsd_drc_lock);
--
2.22.0


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-07-03 15:57:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix overflow causing non-working mounts on 1 TB machines

Good catch! And thanks for the detailed explanation. Applying for 5.2
and stable.

On Wed, Jul 03, 2019 at 02:54:43PM +0200, Paul Menzel wrote:
> Date: Wed, 3 Jul 2019 13:28:15 +0200
>
> Since commit 10a68cdf10 (nfsd: fix performance-limiting session
> calculation) (Linux 5.1-rc1 and 4.19.31), shares from NFS servers with
> 1 TB of memory cannot be mounted anymore. The mount just hangs on the
> client.
>
> The gist of commit 10a68cdf10 is the change below.
>
> -avail = clamp_t(int, avail, slotsize, avail/3);
> +avail = clamp_t(int, avail, slotsize, total_avail/3);
>
> Here are the macros.
>
> #define min_t(type, x, y) __careful_cmp((type)(x), (type)(y), <)
> #define clamp_t(type, val, lo, hi) min_t(type, max_t(type, val, lo), hi)
>
> `total_avail` is 8,434,659,328 on the 1 TB machine. `clamp_t()` casts
> the values to `int`, which for 32-bit integers can only hold values
> −2,147,483,648 (−2^31) through 2,147,483,647 (2^31 − 1).
>
> `avail` (in the function signature) is just 65536, so that no overflow
> was happening. Before the commit the assignment would result in 21845,
> and `num = 4`.
>
> When using `total_avail`, it is causing the assignment to be
> 18446744072226137429 (printed as %lu), and `num` is then 4164608182.
>
> My next guess is, that `nfsd_drc_mem_used` is then exceeded, and the
> server thinks there is no memory available any more for this client.
>
> Updating the arguments of `clamp_t()` and `min_t()` to `unsigned long`
> fixes the issue.
>
> Now, `avail = 65536` (before commit 10a68cdf10 `avail = 21845`), but
> `num = 4` remains the same.
>
> Fixes: 10a68cdf10 (nfsd: fix performance-limiting session calculation)
> Cc: [email protected]
> Signed-off-by: Paul Menzel <[email protected]>
> ---
>
> 1. No, idea if `min_t()` arguments also need updating.
> 2. Instead of `unsigned long`, should `size_t` be used?
>
> fs/nfsd/nfs4state.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 618e66078ee5..1a0cdeb3b875 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1563,7 +1563,7 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca)
> * Never use more than a third of the remaining memory,
> * unless it's the only way to give this client a slot:
> */
> - avail = clamp_t(int, avail, slotsize, total_avail/3);
> + avail = clamp_t(unsigned long, avail, slotsize, total_avail/3);
> num = min_t(int, num, avail / slotsize);
> nfsd_drc_mem_used += num * slotsize;
> spin_unlock(&nfsd_drc_lock);
> --
> 2.22.0
>


2019-07-03 16:03:53

by Paul Menzel

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix overflow causing non-working mounts on 1 TB machines

Dear Bruce,


On 7/3/19 5:56 PM, J. Bruce Fields wrote:
> Good catch! And thanks for the detailed explanation. Applying for 5.2
> and stable.

Thanks. Please note, that in the last part are some guesses, and I am not
well versed in the terminology. So please feel free to reword the commit
messages.


Kind regards,

Paul


Attachments:
smime.p7s (5.05 kB)
S/MIME Cryptographic Signature

2019-07-03 16:43:50

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix overflow causing non-working mounts on 1 TB machines

On Wed, Jul 03, 2019 at 06:03:06PM +0200, Paul Menzel wrote:
> Dear Bruce,
>
>
> On 7/3/19 5:56 PM, J. Bruce Fields wrote:
> > Good catch! And thanks for the detailed explanation. Applying for 5.2
> > and stable.
>
> Thanks. Please note, that in the last part are some guesses, and I am not
> well versed in the terminology. So please feel free to reword the commit
> messages.

I haven't checked all the arithmetic, but it sounds pretty plausible to
me, and clearly that type was wrong.

--b.