Return-Path: Received: from mail-it0-f41.google.com ([209.85.214.41]:35954 "EHLO mail-it0-f41.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726383AbeH1TSY (ORCPT ); Tue, 28 Aug 2018 15:18:24 -0400 Received: by mail-it0-f41.google.com with SMTP id u13-v6so3061630iti.1 for ; Tue, 28 Aug 2018 08:26:13 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 11.5 \(3445.9.1\)) Subject: Re: Question about nfs in infiniband environment From: Chuck Lever In-Reply-To: <93486E63-F27E-4F45-9C43-ECEA66A46183@uvensys.de> Date: Tue, 28 Aug 2018 11:26:11 -0400 Cc: Linux NFS Mailing List Message-Id: References: <0D862469-B678-4827-B75D-69557734D34F@uvensys.de> <93486E63-F27E-4F45-9C43-ECEA66A46183@uvensys.de> To: Volker Lieder Sender: linux-nfs-owner@vger.kernel.org List-ID: Hi Volker- > On Aug 28, 2018, at 8:37 AM, Volker Lieder = wrote: >=20 > Hi, >=20 > a short update from our site. >=20 > We resized CPU and RAM on the nfs server and the performance is good = right now and the error messages are gone. >=20 > Is there a guide what hardware requirements a fast nfs server has? >=20 > Or an information, how many nfs prozesses are needed for x nfs = clients? The nfsd thread count depends on number of clients _and_ their workload. There isn't a hard and fast rule. The default thread count is probably too low for your workload. You can edit /etc/sysconfig/nfs and find "RPCNFSDCOUNT". Increase it to, say, 64, and restart your NFS server. With InfiniBand you also have the option of using NFS/RDMA. Mount with "proto=3Drdma,port=3D20049" to try it. > Best regards, > Volker >=20 >> Am 28.08.2018 um 09:45 schrieb Volker Lieder : >>=20 >> Hi list, >>=20 >> we have a setup with round about 15 centos 7.5 server. >>=20 >> All are connected via infiniband 56Gbit and installed with new = mellanox driver. >> One server (4 Core, 8 threads, 16GB) is nfs server for a disk shelf = with round about 500TB data. >>=20 >> The server exports 4-6 mounts to each client. >>=20 >> Since we added 3 further nodes to the setup, we recieve following = messages: >>=20 >> On nfs-server: >> [Tue Aug 28 07:29:33 2018] rpc-srv/tcp: nfsd: sent only 224000 when = sending 1048684 bytes - shutting down socket >> [Tue Aug 28 07:30:13 2018] rpc-srv/tcp: nfsd: sent only 209004 when = sending 1048684 bytes - shutting down socket >> [Tue Aug 28 07:30:14 2018] rpc-srv/tcp: nfsd: sent only 204908 when = sending 630392 bytes - shutting down socket >> [Tue Aug 28 07:32:31 2018] rpc-srv/tcp: nfsd: got error -11 when = sending 524396 bytes - shutting down socket >> [Tue Aug 28 07:32:33 2018] rpc-srv/tcp: nfsd: got error -11 when = sending 308 bytes - shutting down socket >> [Tue Aug 28 07:32:35 2018] rpc-srv/tcp: nfsd: got error -11 when = sending 172 bytes - shutting down socket >> [Tue Aug 28 07:32:53 2018] rpc-srv/tcp: nfsd: got error -11 when = sending 164 bytes - shutting down socket >> [Tue Aug 28 07:38:52 2018] rpc-srv/tcp: nfsd: sent only 749452 when = sending 1048684 bytes - shutting down socket >> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when = sending 244 bytes - shutting down socket >> [Tue Aug 28 07:39:29 2018] rpc-srv/tcp: nfsd: got error -11 when = sending 1048684 bytes - shutting down socket >>=20 >> on nfs-clients: >> [229903.273435] nfs: server 172.16.55.221 not responding, still = trying >> [229903.523455] nfs: server 172.16.55.221 OK >> [229939.080276] nfs: server 172.16.55.221 OK >> [236527.473064] perf: interrupt took too long (6226 > 6217), lowering = kernel.perf_event_max_sample_rate to 32000 >> [248874.777322] RPC: Could not send backchannel reply error: -105 >> [249484.823793] RPC: Could not send backchannel reply error: -105 >> [250382.497448] RPC: Could not send backchannel reply error: -105 >> [250671.054112] RPC: Could not send backchannel reply error: -105 >> [251284.622707] RPC: Could not send backchannel reply error: -105 >>=20 >> Also file requests or "df -h" ended sometimes in a stale nfs status = whcih will be good after a minute. >>=20 >> I googled all messages and tried different things without success. >> We are now going on to upgrade cpu power on nfs server.=20 >>=20 >> Do you also have any hints or points i can look for? >>=20 >> Best regards, >> Volker >=20 -- Chuck Lever chucklever@gmail.com